Coronavirus 10-day forecast
As of 27 April 2020
Provides estimates of COVID-19 growth rate, detection, and near-future case load in each country, updated daily, based on global data collated by John Hopkins University
Are you interested in other countries? This site is locked to one country, but go to our main site for all countries with more than 20 cases of covid19.
We have specific sites for certain countries. Please visit our US site, our Australian site, our Canadian site, our Chinese site, or our Indian site.
Location
Raw case numbers:
Active cases:
Active cases are total number of infections minus deaths and recoveries.
Doubling time:
Negative doubling time indicates halving time.
When growth rates are changing fast, reduce the fit window to average growth over more recent history
Detection
Cases successfully detected:
Possible number of cases now given imperfect detection:
To see how the assumed case fatality ratio affects detection (and so possible true case numbers) adjust the slider.
Take these last numbers with a grain of salt; they are rough. Undiagnosed cases are current infections yet to develop symptoms and be diagnosed. Undetected cases are current infections that will not be diagnosed. Large numbers of undetected cases indicate that there are many more deaths in the region than there should be given reported case numbers (so there are many undetected cases or a large number of imported cases).
The last plot is the percentage of new cases that are successfully detected, and how this has changed over time. Values near 100% are good, indicating that most cases are being detected/reported. Unexpected outbreaks cause temporary reductions in detection.
Detection can only be calculated up to 17 days in the past, and estimates are often patchy in countries/regions with few deaths.
For more information about University's COVID-19 research, including modelling that contributed to the Australian Federal Government’s response, visit Pursuit, and the Doherty Institute.
For more information, see 'About' tab.
Location
Growth rate
This is the daily growth in active cases, plotted over the last 20 days. It can be thought of as the interest rate, compounded daily.
Positive is bad, negative is good. Progress in control would be indicated by steady decline in growth rate over time, and holding in negative territory.
Curve flattening index
This is a measure of how well a region is flattening the pandemic curve at any point in time. Positive values mean growth rates are declining at that point in time.
Note, this last plot covers the entire time period of the pandemic, not just the last twenty days.
For more details see the 'About' tab.
About
This app was built to give people a sense of how the coronavirus epidemic is progressing in each country. The initial impetus was to move the conversation past daily updates on case numbers, and towards the near future. The hope was that doing so might help move us past the paralysis of daily shock, and towards important action.
As action is taken, the app gives us a sense of how well our actions are working. Difficult decisions have been made at all levels, from the individual to the nation, and it is important that we get feedback on those decisions as soon as possible.
Given how fast the epidemic was progressing, the app was built in a hurry at the University of Melbourne on the 12 March 2020. It has been continually refined since then, and work continues (see acknowledgements). The code is available here and contributions/suggestions are welcome.
The app relies on relatively simple heuristics rather than comprehensive analyses. The emphasis has been on speed, and continually updated data, rather than sophisticated analyses. Sophisticated analyses will likely come in time.
Methods
Raw case numbers
Our global data come from the John Hopkins University dataset, which is updated daily. Our Indian data comes from Ministry of Health and Family Welfare data, and are compiled by a team from Panjab university. These data report for each country the cumulative number of infections (\(I_t\)), number of deaths (\(D_t\)), and (for most countries) number of recoveries (\(R_t\)). The raw case numbers we report are simply a report of the JHU raw numbers for the selected country at the current date. We also provide aggregate tallies for each continent, and the world. Country to continent associations (for aggregating data to continent) were modified from this file. If a region is missing from the app it is either because the region is yet to reach 20 confirmed cases, or because there are currently serious issues with the data (we exclude regions when cumulative case numbers decrease by 25% between days).
Current active cases
Active cases (\(A_t\)) are simply \(A_t = I_t-D_t-R_t\) where \(t\) denotes time. As of 23 March, however, JHU no longer report recoveries in the US and Canada at State/Province level. As a consequence, for these countries. we have had to assume that active cases take 22 days to resolve. Thus, \(A_t = I_t-D_t-(I_{t-22}-D_{t-22})\).
Growth models and forecast of active cases
We are interested in knowing how the number of active cases is going to change in the near term. We provide two alternative growth models. The growth dynamics of epidemics are complex, but we make the simplifying assumption that the epidemic is a long way from population saturation (i.e. that there is a ready supply of susceptible hosts) such that simple exponential growth provides a reasonable short-term approximation. The first model – the “constant growth” model – assumes a constant growth rate, in which case active cases follow an exponential growth model such that \(E(A_t) = A_0e^{rt}\), where \(A_0\) is the initial number of active cases, and \(r\) is the (constant) intrinsic growth rate. The second model – “time-varying growth” – assumes that the growth rate changes linearly over time. Empirically, linear change in growth is what we have been observing in this epidemic as populations enact physical distancing and quarantine measures. Under this model, growth follows a Gaussian function such that \(E(A_t) = A_0e^{r_0t+\frac{b}{2}t^2}\). Here, \(r_0\) is the initial growth rate, and \(b\) is the rate of change in growth rate over time.
To fit both models, we take the natural logarithm of both sides, yielding \(\ln A_t = rt + \ln A_0\) (constant growth) and \(\ln A_t = \frac{b}{2}t^2+rt + \ln A_0\) (time-varying growth). This shows us that we can fit a simple linear regression of \(\ln A_t\) against \(t\) in the constant growth scenario, and a quadratic function in the case of the time-varying model.
These fits give us an estimate of intrinsic growth rate, \(r\), or \(r(t)\).
We fit each model to the last \(n\) days of \(A_t\) data (where \(n\) is determined by the user with the input slider) and extrapolate the fitted model to capture ten day into the future from now. Fitting and extrapolation is effected with the log-transformed model (lower plot on “10-day forecast” tab, with 95% confidence intervals) and the log of expected active case numbers is back-transformed to the original scale to produce the top plot on the “10-day forecast” tab. We provide a larger number of days for fit input for the time-varying growth model because this model estimates an additional parameter, so requires more data.
When public health interventions are rapidly changing the growth rate, this can be seen as deviations from the expected straight line on the log-plot. In these situations, when growth rate is declining rapidly (the curve is flattening), forecasts from the constant growth model (averaging growth over the last ten days) will be biased upwards. By altering the slider you can adjust the window over which growth rate is averaged, so you can get a sense of how recent shifts are affecting the forecast. The time-varying growth rate forecast should be less sensitive to changes in \(n\), and is the better model when growth rates are changing in a steady linear manner.
Doubling time
Under the constant growth model, the doubling time (\(t_d\)) is an intuitive measure of how fast a population is growing. It reports the number of days for the population to double in size and is calculated by setting \(A_t/A_0 = 2\), yielding \(t_d = \frac{\ln 2}{r}\). We calculate this directly from the previous estimate of \(r\).
Detection
One of the key sources of uncertainty in this pandemic has been undetected cases. Covid-19 has been quite a sneaky pathogen, and surveillance has often not been adequate, often resulting in a large number of infections in a country before it is noticed. We were interested to know how bad this problem might be for each country; deaths will be a function of the true number of cases, not the reported number.
Not all cases are detected and so the reported number of active cases is almost always lower than the true number of cases. There are two classes of non-detected cases:
- Those that will be diagnosed soon (“Undiagnosed”). Individuals that are infected but yet to present symptoms, and;
- Those that will never be diagnosed (“Undetected”). Individuals that present with mild symptoms, or are missed by surveillance.
The first class is caused by a delay between infection and diagnosis. However, if we know something about the length of this delay then we can use this information to inflate the observed case numbers to produce estimates of the number of undiagnosed cases. The most important determinant of the delay is the incubation period of the infection. This is known to be highly variable but can be modelled using a probability distribution, which is called the incubation distribution. Information about the incubation distribution exists from external sources. It is known that the average incubation period is 5-6 days and that 95% of infections have an incubation period of no more than 12-13 days. In our model, a log-normal distribution incorporating these features is used to model the incubation distribution. Given this model for the incubation distribution, the daily diagnosis counts reflect a larger and unknown number of Covid-19 infections in the population, aggregated with the incubation distribution. Using the observed diagnosis counts, we can therefore estimate the unknown number of Covid-19 infections by disaggregating the infection numbers from the incubation distribution. This process is known as statistical deconvolution, also called back-projection or back-calculation. From a technical perspective, the method is a non-negatively constrained linear Poisson regression analysis, where the \(y\)-variable is the observed diagnosis counts and the linear model is the linear convolution of the mean infection numbers and the incubation distribution. This is a non-standard analysis but reliable model fitting algorithms exist and are incorporated here. We use these to estimate undiagnosed cases, \(U_t\).
The second class of non-detections, “Undetected” cases, are cases that are completely missed by surveillance. To estimate these, we use a heuristic that assumes: that deaths do not go undetected, that there is community transmission (and a closed population), that there is a case fatality ratio for symptomatic cases (here assumed to be quite high 3.3%; lower numbers will cause our detection estimate to be lower), that detection is constant, and that there is a fixed time (here assumed to be 17 days) between onset and death. Observed new cases in the five days to time \(t\) are calculated as \(N_t = A_t-A_{t-5}\), and under the previous assumptions, the expected number of new cases \(\hat{N_t} = (D_{t+17}-D_{t+12})/0.033\) (Our choice of a five day window for making this estimate reflects our interest in estimating detection early, when there are still few cases. A five-day window typically catches at least one death.) We then estimate detection probability, \(p(t)=N_t/\hat{N_t}\). We average \(p(t)\) over \(t\) to estimate the total number of active symptomatic cases \(T_t = A_t/p\), shown on the first tab of the site. Because detection changes over time also, as of 14 April we take this average over only the last five days of detection observations (typically 17-22 days in the past) to give the most up-to-date estimate for each region. We also show detection over time \(p(t)\) for the selected region as a plot on the forecast tab.
The large number of assumptions behind this last method for estimating detection mean we should be very cautious with it, but it suffices to show that detection probability in some countries is very high (i.e. relatively few undetected cases), while in others it is very low indeed (many undetected cases). In countries with many undetected cases, there are many more deaths than there should be given reported case numbers. One obvious source of bias in both estimates is imported cases. When there is movement between countries and large differences in caseload, countries with small numbers of cases will have a large proportion of imported cases. These will cause us to estimate a larger number of non-detections than there are.
Detection is our weakest estimate in this app; please take it as indicative only. You can expect the estimates to have large uncertainty ranges, however, they do provide working estimates that give a sense of scale. At present we have not included uncertainty ranges such as confidence intervals but may consider this in future implementations if feasible. It would require computationally intensive methods such as bootstrapping. We note also that the estimates of undiagnosed infections can also be used as an alternative way to forecast new cases, since future diagnoses depend on past infections. This is another area of possible future development.
Growth rate
Public health interventions are firmly aimed at reducing virus transmission and so reducing the growth in the number of active cases. The earliest indications of intervention success will manifest in lowered growth rates (around ten days after the intervention is made in the case of covid-19). The per-capita growth rate can be calculated each day as \(G_t = \frac{A_t-A_{t-1}}{A_{t-1}}\). The top plot on the “Growth rates” tab shows this rate as a percentage, \(100\times G_t\), expressing population growth in more broadly understood terms; as a compound interest rate. The plot uses a three-day moving average.
Curve flattening
To capture the idea of “flattening the curve”, we made an index (\(C_t\)) that captures how growth rate changes over time. Conceptually, we were thinking of the second differential of \(\ln A_t\) (scaled against the first differential), but the resultant index is a bit of a mouthful: \[ C_t = -\frac{(\ln A_t-\ln A_{t-1}) - (\ln A_{t-1}-\ln A_{t-2})}{\sqrt{(\ln A_{t-1}-\ln A_{t-2})^2}} \] Which could be simplified, but has been written so as to help you see that it is the change in growth rate divided by the magnitude of the growth rate, and then the whole multiplied by negative 1 so that positive values are “good”, negative values “bad”. It should rightly be thought of as an index of “suppression” rather than an index of “curve-flattening”.
The second plot on the “Growth rate and curve-flattening” tab places a smoother through this index across all time for which we have data. Times when countries are reducing their growth rate are indicated by the smoother sitting in positive territory.
The denominator in this index does make it very sensitive as growth rates come close to zero.
Media enquiries
University of Melbourne Newsroom
Phone: +61 3 8344 4123
Email: news@media.unimelb.edu.au
Acknowledgements
First, to the people on the front line who are collecting these data. You are a legion of brave and selfless people. We hope this work has in some diffuse way made your job easier. Second, a big thanks to the team at John Hopkins University. Their early decision to make these data open access, and their ongoing commitment to curating a large dynamic dataset has provided a critical resource for helping the world understand and manage this pandemic.
The app was conceived by Ben Phillips at the University of Melbourne and has been greatly improved by Drs Melissa Makin and Uli Felzmann (University of Melbourne). These three are the core development and maintenance team. State-level data for India are curated by Drs Vipin Bhatnagar and Jyoti Tripathi (Panjab University). The app has been improved by contributions from Matteo Tomasini (Michigan State University); Sergio Valqui (UoM); and several other contributors on github. Ian Marschner (University of Sydney) provided deconvolution code for estimating undiagnosed infection numbers. Valuable feedback has come from Daniel Bolnick (U Connecticut), Alison Hill (Harvard), Amy Hurford (Memorial University of Newfoundland), Taegan Calnan (Charles Darwin Uni), the Spatial Ecology and Evolution Lab (Melbourne), and a suite of people on the internet who are too numerous to name (but we thank them anyway).