I happened to see the data upload. I was running the data. I thought it would not fit it. Then I did it for two hours. Here you can do a simple recording process. …)
This result still cannot be chaotic, otherwise the next rumor is me, the dog’s life is tight …
The dust of the
era is a mountain on the individual, but we live in the era of dust and soil.
—— Fang Fang
Disaster is not such a thing that died in 20,000 people, but because of a person who died alone, 20,000 times occurred.
——- Kitano
This is the two most impressive words that the disaster gives me. May the deceased rest in peace, RIP
At present, there are many data. I trust a global data set cssegisanddata/covid-19 data from JHU.1, there are more detailed domestic data sets such as DXY-COVID-19-DATA2, believe that many people will conduct data analysis in the future,
I used hereMarch 25, 2020 archive data
After getting the data, it is a
The data of502x66
, including the time sequence data of the world’s 62 -day province/state, has national/provincial/latitude and longitude and time sequences
We do not do the analysis of each region, you can directly accumulate here, and the final data is501x62
Data data
and then the drawing of time sequence after accumulation
Here, in order to handle it, the analysis here is temporarily used for the time beingMATLAB
Here to avoid some data issues, I have been marked alone since 25 days of data,
The data before this was only China, and only the later data was taken here to analyze.
Day 25, which is the data on February 16, 2020,
Here is using Matlab’s cftool for curve fitting3
Specific usage method see reference links
Here uses the second index to fit
y = a ∗ e b x + c ∗ e d x y=a*e^{bx} + c*e^{dx} y=a∗ebx+c∗edx
The final result of the final obtained is shown in the figure
The final result of the fitting results of the final result is available. Generally, take the first set of data
General model Exp2:
f(x) = a*exp(b*x) + c*exp(d*x)
Coefficients (with 95% confidence bounds):
a = 7.173e+04 (7.011e+04, 7.336e+04)
b = 0.007432 (0.004555, 0.01031)
c = 654.7 (443.6, 865.9)
d = 0.1647 (0.1564, 0.1729)
Goodness of fit:
SSE: 9.988e+07
R-square: 0.9994
Adjusted R-square: 0.9994
RMSE: 1767
SSE: Wrong square harmony. The deviation of the fitting of the response in this statistics. The value close to 0 represents a better match.
R-square: Multi -measurement coefficient. The size of the value is between 0 and 1, the closer to 1, indicating that the stronger the variable of the equation, the stronger the interpretation ability of Y.
Adjusted R-square: Reliability adjustment R square. The value near 1 indicates a better match. When you add an additional coefficient to the model, it is usually the best indicator suitable for quality.
RMSE: The equal square root error. The value close to 0 represents a better match.
Here this data fit is already very good. Test finding is found to useFourier polynomial
3 times can also fit well or Gauss 4 or more can fit well. The degree of similarities obtained by fitting is already close. If you want to analyze it with a medical model, here is just simply doing the following
With the data curve, there are many things that can be done. Follow -up fitness according to the obtained curve equation, and then see when it can reach the 100W data mark
If you simply consider the perspective of data, the data fit is not a problem. The data shows that the data will exceed 100W on the day of 03/30.
Data for reference only, no meaning
This result is meaningless. Actually the model is much more complicated. I hope that this data will not continue to increase tomorrow, and then everyone will recover health.
I hope the data will not come true, but I premonition this data is likely to exceed 100W. It should be in the early or middle of April.
May the deceased rest in peace, the world is peaceful
Final with analysis code
% COV Data An
[City, Day] = size (server);
time_sum = zeros (1, day);
for i = 1: Day
Time_sum (1, i) = Sum (Serial (:, i));
end
plot (time_sum, '-*');
Days = 1: Day;
d = 25;
l_Days = 1: Day-D;
for i = 1: Day-D
Time_sum2 (1, i) = SUM (Serial (:, I+D));
end
time_sum2 = time_sum2 -time_sum2 (1);
Data after running February 16th
Figure
hold on
% From the data on the 25th day, it is 02/16 starting
init_day = dates (2020,02,15);
t1 = init_day + l_days;
plot (t1, time_sum2, '-o');
% Draw an fitting curve
a = 7.173e+04;
B = 0.007432;
C = 645.7;
d = 0.1647;
set_day = 45;
hold on
x = 1: set_dy;
y = a*exp (b*x)+c*exp (d*x);
% Draw the boundary line of 10000000
y_max = 1000000*Ones (1, set_dy);
plot (t, y_max);
% Draw prediction line
t = Init_day+(x);
plot (t, y, '-*');
% Create xlabel
xlabel ({'Sequence of the date after February 16th'});
dateTick ('x', 6);
% Create ylabel
ylabel ('Confirm number');