You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ax.set_title("COVID-19 cumulative cases from Jan 21 to Feb 3 2020")
137
139
```
138
140
139
141
The graph has a strange shape from January 24th to February 1st. It would be interesting to know where this data comes from. If we look at the `locations` array we extracted from the `.csv` file, we can see that we have two columns, where the first would contain regions and the second would contain the name of the country. However, only the first few rows contain data for the the first column (province names in China). Following that, we only have country names. So it would make sense to group all the data from China into a single row. For this, we'll select from the `nbcases` array only the rows for which the second entry of the `locations` array corresponds to China. Next, we'll use the [numpy.sum](https://numpy.org/devdocs/reference/generated/numpy.sum.html#numpy.sum) function to sum all the selected rows (`axis=0`). Note also that row 35 corresponds to the total counts for the whole country for each date. Since we want to calculate the sum ourselves from the provinces data, we have to remove that row first from both `locations` and `nbcases`:
@@ -183,9 +185,10 @@ Let's try and see what the data looks like excluding the first row (data from th
183
185
closely:
184
186
185
187
```{code-cell}
186
-
plt.plot(dates, nbcases_ma[1:].T, "--")
187
-
plt.xticks(selected_dates, dates[selected_dates])
188
-
plt.title("COVID-19 cumulative cases from Jan 21 to Feb 3 2020")
ax.set_title("COVID-19 cumulative cases from Jan 21 to Feb 3 2020 - Mainland China")
238
242
```
239
243
240
244
It's clear that masked arrays are the right solution here. We cannot represent the missing data without mischaracterizing the evolution of the curve.
@@ -271,21 +275,25 @@ package to create a cubic polynomial model that fits the data as best as possibl
271
275
```{code-cell}
272
276
t = np.arange(len(china_total))
273
277
model = np.polynomial.Polynomial.fit(t[~china_total.mask], valid, deg=3)
274
-
plt.plot(t, china_total)
275
-
plt.plot(t, model(t), "--")
278
+
279
+
fig, ax = plt.subplots()
280
+
ax.plot(t, china_total)
281
+
ax.plot(t, model(t), "--")
276
282
```
277
283
278
284
This plot is not so readable since the lines seem to be over each other, so let's summarize in a more elaborate plot. We'll plot the real data when
279
285
available, and show the cubic fit for unavailable data, using this fit to compute an estimate to the observed number of cases on January 28th 2020, 7 days after the beginning of the records:
0 commit comments