This app was built to give people a sense of how the coronavirus epidemic is progressing in each country. The initial impetus was to move the conversation past daily updates on case numbers, and towards the near future. The hope was that doing so might help move us past the paralysis of daily shock, and towards important action.

As action is taken, the app gives us a sense of how well our actions are working. Difficult decisions have been made at all levels, from the individual to the nation, and it is important that we get feedback on those decisions as soon as possible.

Given how fast the epidemic was progressing, the app was built in a hurry at the University of Melbourne on the 12 March 2020. It has been continually refined since then, and work continues (see acknowledgements). The code is available here and contributions/suggestions are welcome. If you have data that you would like to see represented here, please get in contact.

The app relies on relatively simple heuristics rather than comprehensive analyses. The emphasis has been on speed, and continually updated data, rather than sophisticated analyses. Sophisticated analyses will likely come in time.

## Methods

### Raw case numbers

Our data is updated daily. Our global data come from the Johns Hopkins University dataset. Our Indian data comes from Ministry of Health and Family Welfare data, and are compiled by a team from Panjab university. Our German data comes from the Robert Koch Institut, and the csv is available here. Our Italian and Colombian data come via the Covid-10 Data Hub - Italian data from Istituto Nazionale di Statistica, and Colombian data from Ministerio de Salud y Protección Social de Colombia.

These data report for each country the cumulative number of infections (\(I_t\)), number of deaths (\(D_t\)), and (for most countries) number of recoveries (\(R_t\)). The raw case numbers we report are simply a report of the JHU raw numbers for the selected country at the current date. We also provide aggregate tallies for each continent, and the world. Country to continent associations (for aggregating data to continent) were modified from this file. If a region is missing from the app it is either because the region is yet to reach 20 confirmed cases, or because there are currently serious issues with the data (we exclude regions when cumulative case numbers decrease by 25% between days).

### Current active cases

Active cases (\(A_t\)) are simply \(A_t = I_t-D_t-R_t\) where \(t\) denotes time. As of 23 March, however, JHU no longer report recoveries in the US and Canada at State/Province level, and recovery data at State level are not available for India. As a consequence, for these countries. we have had to assume that active cases take 22 days to resolve. Thus, \(A_t = I_t-D_t-(I_{t-22}-D_{t-22})\).

### Growth models and forecast of active cases

We are interested in knowing how the number of active cases is going to change in the near term. We provide two alternative growth models. The growth dynamics of epidemics are complex, but we make the simplifying assumption that the epidemic is a long way from population saturation (i.e. that there is a ready supply of susceptible hosts) such that simple exponential growth provides a reasonable short-term approximation. The first model – the “constant growth” model – assumes a constant growth rate, in which case active cases follow an exponential growth model such that \(E(A_t) = A_0e^{rt}\), where \(A_0\) is the initial number of active cases, and \(r\) is the (constant) intrinsic growth rate. The second model – “time-varying growth” – assumes that the growth rate changes linearly over time. Empirically, linear change in growth is what we have been observing in this epidemic as populations enact physical distancing and quarantine measures. Under this model, growth follows a Gaussian function such that \(E(A_t) = A_0e^{r_0t+\frac{b}{2}t^2}\). Here, \(r_0\) is the initial growth rate, and \(b\) is the rate of change in growth rate over time.

To fit both models, we take the natural logarithm of both sides, yielding \(\ln A_t = rt + \ln A_0\) (constant growth) and \(\ln A_t = \frac{b}{2}t^2+rt + \ln A_0\) (time-varying growth). This shows us that we can fit a simple linear regression of \(\ln A_t\) against \(t\) in the constant growth scenario, and a quadratic function in the case of the time-varying model.

These fits give us an estimate of intrinsic growth rate, \(r\), or \(r(t)\).

We fit each model to the last \(n\) days of \(A_t\) data (where \(n\) is determined by the user with the input slider) and extrapolate the fitted model to capture ten day into the future from now. Fitting and extrapolation is effected with the log-transformed model (lower plot on “10-day forecast” tab, with 95% confidence intervals) and the log of expected active case numbers is back-transformed to the original scale to produce the top plot on the “10-day forecast” tab. We provide a larger number of days for fit input for the time-varying growth model because this model estimates an additional parameter, so requires more data.

When public health interventions are rapidly changing the growth rate, this can be seen as deviations from the expected straight line on the log-plot. In these situations, when growth rate is declining rapidly (the curve is flattening), forecasts from the constant growth model (averaging growth over the last ten days) will be biased upwards. By altering the slider you can adjust the window over which growth rate is averaged, so you can get a sense of how recent shifts are affecting the forecast. The time-varying growth rate forecast should be less sensitive to changes in \(n\), and is the better model when growth rates are changing in a steady linear manner.

### Doubling time

Under the constant growth model, the doubling time (\(t_d\)) is an intuitive measure of how fast a population is growing. It reports the number of days for the population to double in size and is calculated by setting \(A_t/A_0 = 2\), yielding \(t_d = \frac{\ln 2}{r}\). We calculate this directly from the previous estimate of \(r\).

### Detection

One of the key sources of uncertainty in this pandemic has been undetected cases. Covid-19 has been quite a sneaky pathogen, and surveillance has often not been adequate, often resulting in a large number of infections in a country before it is noticed. We were interested to know how bad this problem might be for each country; deaths will be a function of the true number of cases, not the reported number.

Not all cases are detected and so the reported number of active cases is almost always lower than the true number of cases. There are two classes of non-detected cases:

- Those that will be diagnosed soon (“Undiagnosed”). Individuals that are infected but yet to present symptoms, and;
- Those that will never be diagnosed (“Undetected”). Individuals that present with mild symptoms, or are missed by surveillance.

The first class is caused by a delay between infection and diagnosis. However, if we know something about the length of this delay then we can use this information to inflate the observed case numbers to produce estimates of the number of undiagnosed cases. The most important determinant of the delay is the incubation period of the infection. This is known to be highly variable but can be modelled using a probability distribution, which is called the incubation distribution. Information about the incubation distribution exists from external sources. It is known that the average incubation period is 5-6 days and that 95% of infections have an incubation period of no more than 12-13 days. In our model, a log-normal distribution incorporating these features is used to model the incubation distribution. Given this model for the incubation distribution, the daily diagnosis counts reflect a larger and unknown number of Covid-19 infections in the population, aggregated with the incubation distribution. Using the observed diagnosis counts, we can therefore estimate the unknown number of Covid-19 infections by disaggregating the infection numbers from the incubation distribution. This process is known as statistical deconvolution, also called back-projection or back-calculation. From a technical perspective, the method is a non-negatively constrained linear Poisson regression analysis, where the \(y\)-variable is the observed diagnosis counts and the linear model is the linear convolution of the mean infection numbers and the incubation distribution. This is a non-standard analysis but reliable model fitting algorithms exist and are incorporated here. We use these to estimate undiagnosed cases, \(U_t\).

The second class of non-detections, “Undetected” cases, are cases that are completely missed by surveillance. To estimate these, we use a heuristic that assumes: that deaths do not go undetected, that there is community transmission (and a closed population), that there is a case fatality ratio, \(f\), for symptomatic cases (set by the user with a slider; lower numbers will cause our detection estimate to be lower), that detection is constant, and that there is a fixed time (here assumed to be 17 days) between onset and death. Observed new cases in the five days to time \(t\) are calculated as \(N_t = A_t-A_{t-5}\), and under the previous assumptions, the expected number of new cases \(\hat{N_t} = (D_{t+17}-D_{t+12})/f\) (Our choice of a five day window for making this estimate reflects our interest in estimating detection early, when there are still few cases. A five-day window typically catches at least one death.) We then estimate detection probability, \(p(t)=N_t/\hat{N_t}\). We average \(p(t)\) over \(t\) to estimate the total number of active symptomatic cases \(T_t = A_t/p\), shown on the first tab of the site. Because detection changes over time also, as of 14 April we take this average over only the last five days of detection observations (typically 17-22 days in the past) to give the most up-to-date estimate for each region. We also show detection over time \(p(t)\) for the selected region as a plot on the forecast tab.

The large number of assumptions behind this last method for estimating detection mean we should be very cautious with it, but it suffices to show that detection probability in some countries is very high (i.e. relatively few undetected cases), while in others it is very low indeed (many undetected cases). In countries with many undetected cases, there are many more deaths than there should be given reported case numbers. One obvious source of bias in both estimates is imported cases. When there is movement between countries and large differences in caseload, countries with small numbers of cases will have a large proportion of imported cases. These will cause us to estimate a larger number of non-detections than there are.

Detection is our weakest estimate in this app; please take it as indicative only. You can expect the estimates to have large uncertainty ranges, however, they do provide working estimates that give a sense of scale. At present we have not included uncertainty ranges such as confidence intervals but may consider this in future implementations if feasible. It would require computationally intensive methods such as bootstrapping. We note also that the estimates of undiagnosed infections can also be used as an alternative way to forecast new cases, since future diagnoses depend on past infections. This is another area of possible future development.

### Growth rate

Public health interventions are firmly aimed at reducing virus transmission and so reducing the growth in the number of active cases. The earliest indications of intervention success will manifest in lowered growth rates (around ten days after the intervention is made in the case of covid-19). The per-capita growth rate can be calculated each day as \(G_t = \frac{A_t-A_{t-1}}{A_{t-1}}\). The top plot on the “Growth rates” tab shows this rate as a percentage, \(100\times G_t\), expressing population growth in more broadly understood terms; as a compound interest rate. The plot uses a three-day moving average.

### Curve flattening

To capture the idea of “flattening the curve”, we made an index (\(C_t\)) that captures how growth rate changes over time. Conceptually, we were thinking of the second differential of \(\ln A_t\) (scaled against the first differential), but the resultant index is a bit of a mouthful: \[ C_t = -\frac{(\ln A_t-\ln A_{t-1}) - (\ln A_{t-1}-\ln A_{t-2})}{\sqrt{(\ln A_{t-1}-\ln A_{t-2})^2}} \] Which could be simplified, but has been written so as to help you see that it is the change in growth rate divided by the magnitude of the growth rate, and then the whole multiplied by negative 1 so that positive values are “good”, negative values “bad”. It should rightly be thought of as an index of “suppression” rather than an index of “curve-flattening”.

The second plot on the “Growth rate and curve-flattening” tab places a smoother through this index across all time for which we have data. Times when countries are reducing their growth rate are indicated by the smoother sitting in positive territory.

The denominator in this index does make it very sensitive as growth rates come close to zero.

## The University

For more information about University of Melbourne's COVID-19 research visit Pursuit, and the Doherty Institute.

If you would like to support the University's work, please consider giving to our covid-19 emergency appeal.

### Media enquiries

University of Melbourne Newsroom, Phone +61 3 8344 4123; Email news@media.unimelb.edu.au

## Acknowledgements

First, to the people on the front line who are collecting these data. You are a legion of brave and selfless people. We hope this work has in some diffuse way made your job easier. Second, a big thanks to the team at Johns Hopkins University. Their early decision to make these data open access, and their ongoing commitment to curating a large dynamic dataset has provided a critical resource for helping the world understand and manage this pandemic.

The app was conceived by Ben Phillips at the University of Melbourne and has been greatly improved by Drs Melissa Makin and Uli Felzmann (University of Melbourne). These three are the core development and maintenance team. State-level data for India are curated by Drs Vipin Bhatnagar and Jyoti Tripathi (Panjab University). The app has been improved by contributions from Matteo Tomasini (Michigan State University); Sergio Valqui (UoM); and several other contributors on github. Ian Marschner (University of Sydney) provided deconvolution code for estimating undiagnosed infection numbers. Valuable feedback has come from Daniel Bolnick (U Connecticut), Alison Hill (Harvard), Amy Hurford (Memorial University of Newfoundland), Taegan Calnan (Charles Darwin Uni), the Spatial Ecology and Evolution Lab (Melbourne), and a suite of people on the internet who are too numerous to name (but we thank them anyway).

Thanks to Defne Şahin from the University of Western Australia for the Turkish translation. Thanks also to the following, who are all from the University of Melbourne: Sanjana Ratan for the Hindi translation, Uli Felzmann for the German translation, Sergio Valqui for the Spanish translation, and Kostantin Borovkov for the Russian translation.