You could just add an additional compartment to model this, couldn’t you?
I assume so, I’ll
I thought the compartment models needed to sum to a constant population? At least the ones I have looked at do. So the Tweet state is orthogonal to the SIR model or more complex versions of the same.
thnsk

Where I think we should do the something different. Use the daily twitter symptom mentions to help predict infected (or even dead maybe).

I think an appropriate way to add tweets is to just use them to predict infected

I don’t think it is interesting to use tweets to predict infected, we don’t have any real data or estimate related to infected besides the
I
compartments in the ODE system. We should use tweets to predict deathsD
that we have quite reliable data (actually the only real-world data we condition the model aredaily_deaths
anddaily_tweets_mentions
).
Is the goal of the model to predict deaths or to understand something about the dynamics? If tweets just modify the predicted number of deaths compared to the SIR expectation, then they might improve prediction, but they will also partially decouple the SIR model from the only data (deaths) that exist to inform it. In particular, this will not allow the tweets data to inform the likely numbers of infected individuals, and therefore the ODE parameters; quite the opposite.
I would have thought that something more like the usage of the Liverpool approach of using the number of infected to predict calls was closer to a plausible data generating process. The idea being that both (lagged) deaths and tweets are informative about the number of infected individuals in the population, and the causality runs in the direction of {number of infections} → {deaths, tweets}.

I thought the compartment models needed to sum to a constant population?
Yes, but you can extend the model however you like.
You could intrdouce a dummy compartment ExposedTwitter
with the same inflow as the original exposed compartment (the number of infected) but with an outflow to a Twitter
compartment whose members tweet about their symptoms and which has its own outflow into nothing.
You would then have to model the flows ExposedTwitter
→ Twitter
→ Nothing
.

Is the goal of the model to predict deaths or to understand something about the dynamics?
The goal is to show “how social media monitoring with a pre-trained classifier can help epidemiological Bayesian models to better infer/predict”. We will demonstrate that with Brazilian tweets, COVID-19, and SEIR-substate model.

I would have thought that something more like the usage of the Liverpool approach of using the number of infected to predict calls was closer to a plausible data generating process.
…
and the causality runs in the direction of {number of infections} → {deaths, tweets}.
Jacob I agree but I don’t know how right or wrong is the Infected I = I_1 + I_2
in the SEITD model since infected is underreported and also we don’t randomly test people in Brazil. I would very much prefer to condition calls/symptoms on real data that we compare and understand what is going on.

You could intrdouce a dummy compartment
ExposedTwitter
with the same inflow as the original exposed compartment (the number of infected) but with an outflow to aYou would then have to model the flows
ExposedTwitter
→Nothing
.
Yes, that sound a great approach. I would definitely need some help on what that would look like in the ODE system. But, kinda pushing the handbrake: What are your opinions on using Infected data generated by the SEITD ODE system? We don’t have good data to do any prediction predictive check on the infected I
.

What are your opinions on using Infected data generated by the SEITD ODE system?
Hm, depends on what you want to use it for. Infer the proportion infected? Sure, but for this the cumulative number of deaths should be sufficient. I don’t think you gain a lot of (or any?) extra information on the IFR, which is essentially all you need for this.
Or what do you want to use the infected data for?

We don’t have good data to do any prediction predictive check on the infected
I
.
I think what you can do is compare the death predictions a few weeks into the future. This you should be able to use to compare different models. And hopefully a model with extra info (Twitter / Calls) performs better.

I’m not too surprised that the time series of daily deaths is unaffected by the number of sub-states. My hunch is that you’ll see a difference between the two transmission models if you visualise the sizes of the exposed, infectious and terminally ill states.
@remoore , we have some new insights.
We went through the Brazilian National Health System (SUS) data (DATASUS) and found the data for hospitalizations due to Severe Acute Respiratory Illness (SARI). We only used data for 2020 and only the hospitalizations with a PCR-positive COVID test (~66% of the hospitalizations).
We took the difference in days between hospitalization and death or hospital discharge (what is called “alta” in Spanish and Portuguese). I only have the mean (@andrelmfsantos a PhD candidate that did this will generate some figures and more summary stats) which is 16.4 days.
Looking the SEITD vs SEEIITD we have the following posterior density (using the same priors as Liverpool and Cambridge) for dT
(the mean time for which individuals are terminally ill):
r$> seitd$summary(variables = "dT")
# A tibble: 1 x 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dT 16.2 16.2 0.691 0.689 15.0 17.3 1.00 3628. 2971.
r$> seeiittd$summary(variables = "dT")
# A tibble: 1 x 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dT 16.4 16.4 0.706 0.733 15.3 17.5 1.00 1037. 3074.
I know that the difference is small but due to huge amount of data 0.2 difference in the mean estimate of dT
is a big difference in inference. And SEEIITTD estimate is more aligned with the real-world data (that the model has not seen - it only uses daily deaths).
I have not yet analyzed the full posterior or the density plots of the hospitalizations/discharges difference in DATASUS.

Severe Acute Respiratory Syndrome (SARS)
Just a minor correction: the translation of Síndrome respiratória aguda grave (SRAG) is, counterintuitively, Severe Acute Respiratory Illness (SARI), and that is what is reported in most Brazilian systems, including SIVEP-Gripe. This is mostly due to historical reasons, so as to not cause confusion with the already established and quite diferent SARS that emerged in Asia in the early 2000s.

Just a minor correction: the translation of Síndrome respiratória aguda grave (SRAG) is, counterintuitively, Severe Acute Respiratory Illness (SARI)
Thanks! I get always corrected by epidemiologists. There is something funky also with fatality versus “letality”…
So today Stan meeting was awesome. Lucky and grateful to pick everybody’s mind at CoDatMo Twitter Model. So, heres my summaries:
- We are on right track to condition twitter symptoms mentions on daily deaths (we have real data to back it up) rather than on daily infected (we only have estimated data from the ODE system to back it up). We would use a latent lag with data from COVID Severe Acute Respiratory Illness (SARI) hospitalizations and deaths as a prior.
-
dT
which is the mean time for which individuals are terminally ill (in days) is a global parameter and actually it varies daily/weekly etc. We could estimate it by using weeklydT
pieces or even a Gaussian Process - The infection come in waves. First the younger, then the older, and then the elderly. I don’t know how to model that but we can dismiss this and maintain the assumption of random mixing and contact between
S
andI
Thanks to @Bob_Carpenter and @charlesm93 for all the insights!
Following from this thread, I suggest we continue the discussion here.
I have revisited the “simplex” model you have offered in that post, as it has potential. I’d like to get a better understanding of it but, after an online search, I cannot find a reference to such an approach.
I fail to understand how the transition rate between S and E_1, \beta \frac{I_1 + I_2}{N} S, is handled. See the Liverpool model description.
I think I also miss the rationale behind the use of a simplex vector.
Thanks.
Great! I’ll see what I can do next week, as this will probably require me to write more than just a few short sentences.
Happy to help but @remoore, @rosato_11 and @alphillips are closer to the specifics of what we’ve done in Liverpool. Hopefully they can chip in.
Great, I thought you might know who knows more about the specifics :)
We now have a preprint of the write up of the model here: https://arxiv.org/abs/2111.04498. The code used in the paper can be found here: https://github.com/codatmo/UniversityOfLiverpool_PaperSubmission. Will be happy to have further discussions relating to any specifics!