One thought on “Time series forecasting and mortality rate”

  1. Forecasting without process-based models is a very tricky business. In the case of COVID-19, there are many models which fit data-in-hand and even do hindcasting well, but are all over the place, and fall down as data emerge from records of the pandemic. See https://www.youtube.com/watch?v=MZ957qhzcjI for some discussion.

    I have looked at some hopeful work in literature and among technologically creative businesses which rely upon forecasting, e.g., Miller, et al (1 August 2018), Clinical and Infectious Diseases, 2018:67 , with claims of possible relevance to COVID-19 at, for instance, https://content.kinsahealth.com/covid-detection-technical-approach and https://healthweather.us/. I had hoped that there might be a way to improve these beyond ARIMA-style forecasts. The deep look is disappointing, though, because basic means of assessing forecasting are not being used, e.g., Brier Scores. Also the study in question used correlation coefficients as their figures of merit rather than, say, direct quantitative and blinded prediction of cases.

    My assessment is based upon three thoughts:

    (1) Before charging ahead to hope there’s something that can help with the pandemic, it is important to do the fundamental work needed to make forecasting viable in the first place. This includes all the usual concerns about balance in sampling for training and calibration data, critical evaluation of performance using cross-validation and out-of-sample tests, and skepticism about non-mechanistic forecasting.

    (2) While complicated models can be penalized using devices like AIC, BIC, there are advantages to using simple models like those described by Jewell in the lecture above which directly relate to epidemic mechanisms. Moreover, we really don’t need better forecasting now. We know how bad it could be. We need better implementation of countermeasures. These all need to be pursued with a broad context of respect for our great ignorance. This is a novel virus, and experience with its genetic predecessor SARS is not going to help much, because it overlaps with its genome only 75%. So expectations and projections of “herd immunity” and virulence of other pathogens do not necessarily transfer. It is also incredibly difficult to do causal inference in a world with an exploding pandemic and where the data are jostled by agents trying to influence outcomes, and even suppress reporting, e.g., compare CDC reports with Johns Hopkins reports (at https://coronavirus.jhu.edu/map.html), or tallies of state unemployment filings with U.S. Department of Labor reports.

    (3) As difficult as it is to do, because of political issues and “shed problems”, decisions in this space need to incorporate various losses incurred when being wrong. Doing that is nearly impossible using non-mechanistic “data mining” or conventional data science and ML approaches. For the present, we might put more weight on the advice of physicist Arthur Eddington: “It is … a good rule not to put overmuch confidence in … observational results that are put forward until they are confirmed by theory.”

Leave a reply to ecoquant Cancel reply