Unstatistik March 2020 (English): Corona pandemic - Statistical concepts and their limits

It is still uncertain how the COVID-19 pandemic will develop. The Unstatistics of the Month would like to shed some light on the current situation, at least with regard to statistical concepts. Therefore, we present none of the usual unstatistics, but instead explain essential concepts and their limits. Notwithstanding the fact that the most important factors in the forecast of the spread of COVID-19 are subject to a high degree of uncertainty, the containment of new infections must have absolute priority in the current situation. Moreover, whether the measures currently taken are effective can only be determined with a time lag. Country comparisons quickly reach their limits because case numbers and deaths are not collected according to uniform procedures. As far as statistics are concerned, the current principle is to proceed by “driving on sight” when assessing model calculations, and avoid placing too much attention to individual information.

© CDC via AP

Estimation of the prevalence rates

Pandemics usually result in an exponential growth in the number of infected persons, as each infected person infects other persons, who in turn infect other persons in a snowball effect. Exponential growth is thus characterized by constant growth rates and not only by constant absolute increases. It therefore inevitably leads to a doubling of the number of infected persons in a given period. If this period of time is short, the absolute number of infected persons quickly becomes very large, regardless of whether one starts from a small or a slightly larger number.

If the basic characteristics of the disease pattern are known, the development of a pandemic can be predicted fairly accurately. First, the number of people an infected person typically infects (the so-called reproductive factor) is crucial. This factor depends not only on the virus, but also on our contact behaviour. Second, the key factor for this reproductive factor is how long an infected person already has been infected. Third, the question of whether immunity occurs after the disease has been overcome also determines the number of possible new infections.

Based on an assessment of these factors, the exponential spread of such a pandemic in the population can be estimated quite reliably. Since March 15, we have observed a daily growth rate of infected persons of about 23 percent, i.e. the number of infected persons doubles every 3 days. If we use an exponential growth model for clarification purposes alone and start with 6,000 infected persons (approx. the number of infected persons in Germany on March 15), almost 109,000 people would be infected within 14 days, and after 30 days almost 3 million.

It is obvious that this development is likely to quickly exhaust the capacities of the health care system, even if only a very small proportion of infected persons show a severe course of the disease, forcing them to be treated in an intensive care unit or otherwise face a life-threatening circumstance. For example, given 1.5 million infected persons, only 3% of whom suffer from a severe case of disease, the intensive care capacities available in Germany would be exhausted even if no other severe cases were to be treated there. In plain language, the unavoidable rationing of intensive care capacities means having to accept a large number of deaths.

Thus, as long as there is no vaccine, the reproductive factor becomes the decisive component of any conceivable defense strategy. As soon as this factor drops to the value of 1, the number of new infections is stabilized at the level reached at that time; if it drops below that level, the number drops even further. If individual cases of infected persons could be clearly identified immediately, it would be comparatively easy to organize the isolation of these infected persons and the quarantining of their direct contact persons. The reproduction factor would then probably drop rapidly, and a (vigilant) form of normality could return.

At the moment, however, this solution is not yet available, and the appropriate testing capacities and procedures to implement this strategy have yet to be established. Thus, the only strategy that remains at the moment is the strategy that is not very selective and is consequently painful for our economic and social life, which entails slowing down the spread of COVID-19 by generally reducing direct social contacts. If the population is disciplined, the mathematical laws of exponential growth can help to slow down the spread of COVID-19. In the above example, halving the daily growth rate to 12 percent would result in just under 30 thousand infected people after 14 days and 180 thousand infected people after 30 days. The more disciplined we are at washing our hands, keeping our distance, and taking other hygienic measures, the lower the growth rate will be.

These are, of course, all just simple calculations. At present, there is no completely clear-cut information on all three of the above-mentioned factors in the case of COVID-19, because we are entering unknown territory. Statistics, epidemiology and virological expertise are therefore equally necessary to derive at least rough estimates in real time from the newly arriving data, in order to assess the spread of the pandemic and the effectiveness of various measures. Even experts cannot reliably predict how many new infections are to be expected in the coming days due to the uncertain data situation. The ranges within which the unknown parameters can lie are much too wide for this.

And yet these sample calculations are quite sufficient to justify decisive political action that currently gives absolute priority to curbing new infections. For example, model calculations by both the Robert Koch Institute and the German Society for Epidemiology clearly show that whether the reproduction rate in reality is 2.5 or 1.5, or whether minor changes are made to the other assumptions necessary for the use of these models, is irrelevant to the question of whether this strategy should be adopted: If the reproduction factor is not quickly pushed towards the value of 1, the German health system will collapse within a very short time. It is then only a question of a few weeks until this point is reached. This can only be prevented now.

The classification of the case numbers 

The dynamics of infectious diseases usually have an incubation period between the initial infection and the development of symptoms, thus not only preventing early detection and isolation of infected individuals without a well-developed system of comprehensive testing, but also preventing the development of new infections. It also inevitably leads to the fact that the effectiveness of measures introduced today will only become apparent in a few days or even weeks, even if they have the desired effect immediately. Above all, the continuing increase in the number of cases does not mean that the measures now being taken are not effective.

This is aggravated by the fact that, in the current situation, conclusions must be based on particularly uncertain data. For example, the number of infected persons tested has only a limited relationship to the number of people actually infected, because people with few or no symptoms have so far been tested in very rare cases, especially if they have not had any contact with demonstrably infected persons. Only with the development of faster test methods, which were first used in Germany a few days ago, will it become possible to test systematically. There will probably be regional differences. The number of infected persons detected will depend to a large extent on how intensively testing is carried out in the different regions.

If the proportion of confirmed cases in all infected persons, i.e. the sum of confirmed cases and still unrecorded cases, changes due to new test procedures, the reported case numbers may increase without this being based on an accelerated disease dynamic. The observed case numbers therefore allow only limited conclusions to be drawn as to whether the assumptions about infection rates used in forecasts were correct or not.

Therefore, assumptions about the increased case numbers of the past few days are probably subject to misconception. Because of incubation time, test, and evaluation duration, today's case numbers go back to infections of 5-10 days ago. On Monday last week, however, a new, faster test procedure (CDC test) was introduced. It can be safely assumed that the test and evaluation time has been accelerated in the following days and for this reason alone the number of confirmed cases has temporarily increased.

In addition, media reporting repeatedly compares the current measures with the case numbers ("despite the more stringent measures, the case numbers continued to rise yesterday"). However, we will probably not be able to judge whether the tightened measures are working until one or two weeks at the earliest. In this respect, politicians must be given the time to evaluate the success of the measures. The strategy of breaking the contagion dynamic by consistently reducing social contacts should not be called into question by frustration over the lack of effect of this measure, particularly before this effect can even be seen in the data.

The pitfalls of country comparisons

Since all nations are more or less pursuing their own strategy for dealing with the COVID-19 pandemic, international comparison is in principle an excellent basis for identifying effective strategies. However, it is not enough to simply compare developments in Germany with those in other countries without considering the limitations of comparability. In particular, the case numbers recorded in each country depend centrally on how systematically and extensively tests for the virus are implemented. Similarly, due to the exponential nature of the case growth described above, the proven spread of the virus depends very much on when the first person in a country became infected and when a government introduced measures – and not only on the measures themselves.

In addition, many country comparisons repeatedly refer to the ratio of deaths to those confirmed infected at the time, or divide the cumulative deaths by the cumulative confirmed cases. This approach, however, uses the wrong comparison group, and the lethality of COVID-19 is underestimated due to exponential growth. It would make sense to choose the confirmed cases of the infected cohort from which the presumed deaths originate as the control group. A comparison of the time series of confirmed infections and deaths from China and Germany leads to the conclusion that about 11 days delay provides the most stable ratio, i.e. that it is most plausible to calculate the share of deaths in the number of confirmed cases 11 days earlier.

However, if the estimated number of undetected cases is not taken into account (which in turn depends to a large extent on the number of the tests performed), the denominator of the ratio is too small and thus the estimated lethality – i.e. the proportion of deaths among all newly infected persons – is systematically overestimated. Furthermore, the statistical coverage of causes of death varies considerably from country to country. It is difficult to determine whether a person died with the virus or as a result of the virus. If, as in many countries, a coronavirus is detected retrospectively in the case of deaths from chronic diseases and advanced age, some of them will have died not by but with the virus. This also leads to an overestimation of the death rate. Overall, it must be said that a precise estimate of mortality is almost impossible at this stage.

However, there is a natural experiment, the cruise ship "Diamond Princess", where it can be assumed that the infected persons are completely recorded because all passengers have been tested. Although the crew of a cruise ship is older than the average population, statisticians can at least approximately eliminate this different age composition. After age standardization, the data from the "Diamond Princess" indicates a mortality rate of COVID-19 of 0.5% – with an uncertainty of about +/-50%.


The evidence available to date on the COVID-19 pandemic is not sufficient to reliably predict its further spread, especially under the conditions of different policy measures to contain new infections. One should undoubtedly continue to follow the development of the pandemic in detail, but without being too impressed by individual information. Due to the exponential growth in the spread of such a virus, the best way of pointing the finger at an initial weakening of the problem is probably to reduce growth rates on several consecutive days.

At the moment, however, this principle is to be guided by the outlined model calculations. Despite the plethora of factors that prevent a reliable forecast of the future spread, simulation studies with various thoroughly realistic scenarios show very clearly that the German health care system would collapse completely within a few days if reproduction rates were not rapidly reduced to a value of 1 by consistently avoiding social contacts. At the same time, one should not panic every day to reassess whether the measures are effective or not. The effect of these measures will not become apparent for at least one to two weeks.

The delay in the spread of the virus can then hopefully give the health system time to build up the necessary capacity to treat serious cases and – with a somewhat longer perspective – to conduct research into drugs and a possible vaccine. Most importantly, it would allow time to build the capacity to test a large number of people quickly and repeatedly. This would open up the possibility of a gradual return to a reasonably normal life, thereby limiting to some extent the negative social and economic consequences of this pandemic.

Katharina Schüller (STAT-UP), e-mail
Prof. Dr. Thomas K. Bauer (RWI), e-mail
Prof. Dr. Dr. h. c. Christoph M. Schmidt (RWI), e-mail
Sabine Weiler (Press Office RWI), e-mail

About „Unstatistik des Monats“
Every month, psychologist Gerd Gigerenzer, statistician Walter Krämer, data analyst Katharina Schüller and econometrician Thomas K. Bauer question recently published figures and their interpretations. This press release has also been co-authored by Christoph M. Schmidt, president of RWI – Leibniz Institute for Economic Research.