CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver)

You could just add an additional compartment to model this, couldn’t you?

I assume so, I’ll

I thought the compartment models needed to sum to a constant population? At least the ones I have looked at do. So the Tweet state is orthogonal to the SIR model or more complex versions of the same.
thnsk

Is the goal of the model to predict deaths or to understand something about the dynamics? If tweets just modify the predicted number of deaths compared to the SIR expectation, then they might improve prediction, but they will also partially decouple the SIR model from the only data (deaths) that exist to inform it. In particular, this will not allow the tweets data to inform the likely numbers of infected individuals, and therefore the ODE parameters; quite the opposite.

I would have thought that something more like the usage of the Liverpool approach of using the number of infected to predict calls was closer to a plausible data generating process. The idea being that both (lagged) deaths and tweets are informative about the number of infected individuals in the population, and the causality runs in the direction of {number of infections} → {deaths, tweets}.

Yes, but you can extend the model however you like.

You could intrdouce a dummy compartment ExposedTwitter with the same inflow as the original exposed compartment (the number of infected) but with an outflow to a Twitter compartment whose members tweet about their symptoms and which has its own outflow into nothing.

You would then have to model the flows ExposedTwitterTwitterNothing.

The goal is to show “how social media monitoring with a pre-trained classifier can help epidemiological Bayesian models to better infer/predict”. We will demonstrate that with Brazilian tweets, COVID-19, and SEIR-substate model.

Jacob I agree but I don’t know how right or wrong is the Infected I = I_1 + I_2 in the SEITD model since infected is underreported and also we don’t randomly test people in Brazil. I would very much prefer to condition calls/symptoms on real data that we compare and understand what is going on.

Yes, that sound a great approach. I would definitely need some help on what that would look like in the ODE system. But, kinda pushing the handbrake: What are your opinions on using Infected data generated by the SEITD ODE system? We don’t have good data to do any prediction predictive check on the infected I.

1 Like

Hm, depends on what you want to use it for. Infer the proportion infected? Sure, but for this the cumulative number of deaths should be sufficient. I don’t think you gain a lot of (or any?) extra information on the IFR, which is essentially all you need for this.

Or what do you want to use the infected data for?

I think what you can do is compare the death predictions a few weeks into the future. This you should be able to use to compare different models. And hopefully a model with extra info (Twitter / Calls) performs better.

@remoore , we have some new insights.

We went through the Brazilian National Health System (SUS) data (DATASUS) and found the data for hospitalizations due to Severe Acute Respiratory Illness (SARI). We only used data for 2020 and only the hospitalizations with a PCR-positive COVID test (~66% of the hospitalizations).

We took the difference in days between hospitalization and death or hospital discharge (what is called “alta” in Spanish and Portuguese). I only have the mean (@andrelmfsantos a PhD candidate that did this will generate some figures and more summary stats) which is 16.4 days.

Looking the SEITD vs SEEIITD we have the following posterior density (using the same priors as Liverpool and Cambridge) for dT (the mean time for which individuals are terminally ill):

r$> seitd$summary(variables = "dT")                                                                                                                 
# A tibble: 1 x 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 dT        16.2   16.2 0.691 0.689  15.0  17.3  1.00    3628.    2971.

r$> seeiittd$summary(variables = "dT")                                                                                                              
# A tibble: 1 x 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 dT        16.4   16.4 0.706 0.733  15.3  17.5  1.00    1037.    3074.

I know that the difference is small but due to huge amount of data 0.2 difference in the mean estimate of dT is a big difference in inference. And SEEIITTD estimate is more aligned with the real-world data (that the model has not seen - it only uses daily deaths).

I have not yet analyzed the full posterior or the density plots of the hospitalizations/discharges difference in DATASUS.

3 Likes

Just a minor correction: the translation of Síndrome respiratória aguda grave (SRAG) is, counterintuitively, Severe Acute Respiratory Illness (SARI), and that is what is reported in most Brazilian systems, including SIVEP-Gripe. This is mostly due to historical reasons, so as to not cause confusion with the already established and quite diferent SARS that emerged in Asia in the early 2000s.

Thanks! I get always corrected by epidemiologists. There is something funky also with fatality versus “letality”…

So today Stan meeting was awesome. Lucky and grateful to pick everybody’s mind at CoDatMo Twitter Model. So, heres my summaries:

  1. We are on right track to condition twitter symptoms mentions on daily deaths (we have real data to back it up) rather than on daily infected (we only have estimated data from the ODE system to back it up). We would use a latent lag with data from COVID Severe Acute Respiratory Illness (SARI) hospitalizations and deaths as a prior.
  2. dT which is the mean time for which individuals are terminally ill (in days) is a global parameter and actually it varies daily/weekly etc. We could estimate it by using weekly dT pieces or even a Gaussian Process
  3. The infection come in waves. First the younger, then the older, and then the elderly. I don’t know how to model that but we can dismiss this and maintain the assumption of random mixing and contact between S and I

Thanks to @Bob_Carpenter and @charlesm93 for all the insights!

2 Likes

@Funko_Unko

Following from this thread, I suggest we continue the discussion here.

I have revisited the “simplex” model you have offered in that post, as it has potential. I’d like to get a better understanding of it but, after an online search, I cannot find a reference to such an approach.
I fail to understand how the transition rate between S and E_1, \beta \frac{I_1 + I_2}{N} S, is handled. See the Liverpool model description.

I think I also miss the rationale behind the use of a simplex vector.
Thanks.

2 Likes

Great! I’ll see what I can do next week, as this will probably require me to write more than just a few short sentences.

2 Likes

Happy to help but @remoore, @rosato_11 and @alphillips are closer to the specifics of what we’ve done in Liverpool. Hopefully they can chip in.

4 Likes

Great, I thought you might know who knows more about the specifics :)

3 Likes

We now have a preprint of the write up of the model here: https://arxiv.org/abs/2111.04498. The code used in the paper can be found here: https://github.com/codatmo/UniversityOfLiverpool_PaperSubmission. Will be happy to have further discussions relating to any specifics!

3 Likes