CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver)

Funko_Unko · May 19, 2021, 5:28pm

Hm, a quick PSA: I haven’t thought very much about the initial conditions, and they are probably not appropriate.

storopoli · May 19, 2021, 5:32pm

Also I don’t have any predictive improvement using the Linear Chain Trick (S E1 E2 I1 I2 T1 T2 D) from a simple (S E I T D). At least in Brazil’s data.

Take a look at:

deaths_trapezoidal_seitd.stan which is the same as deaths_trapezoidal.stan but without the sub-compartments.
deaths_matrix_exp_seitd.stan which is the same as deaths_matrix_exp_seeiittd.stan but without the sub-compartments.

I am getting the same MAE for weekly predicted deaths vs real deaths:

# A tibble: 1 x 2
  MAE_median MAE_mean
       <dbl>    <dbl>
1       253.     255.

See the pictures below:

Funko_Unko · May 19, 2021, 5:49pm

It’s the same for all models?

storopoli · May 19, 2021, 5:55pm

Yeah at Least matrix_exp and trapezoidal

Funko_Unko · May 19, 2021, 6:08pm

I did also think about putting a prior on beta directly, but then thought better of it D:

Maybe there are just too many tunable degrees of freedom for any of the models? You could probably fit not only one elephant+trunk but a whole herd of them.

storopoli · May 20, 2021, 8:40pm

Ok I am trying to explore a little more the sub-compartmental Linear Chain Trick (LCT) Models. Specifically, I wanna if SEEIITTD is any better than SEITD from a totally naïve approach.

I’ve run already a weekly beta model and saw no difference deaths_matrix_exp_seitd.stan vs deaths_matrix_exp_seeiittd.stan.

Now, I’ve run a fixed global beta and also no difference deaths_fixed_beta_seitd.stan vs deaths_matrix_exp_seeiittd.stan.

Here is a picture:

I am tagging @s.maskell, @remoore and @alphillips who might be interested.

remoore · May 21, 2021, 11:01am

@storopoli, this is an aspect of the model that’s worth exploring. The choice of two sub-states for the exposed, infectious and terminally ill compartments was somewhat arbitrary in the initial version of the model. I’ve seen similar epidemiological models that use anywhere from 1 to 3 sub-states. It’s my impression that epidemiologists generally think that the Erlang distributed dwell time given by two or more sub-states better resembles reality.

I’m not too surprised that the time series of daily deaths is unaffected by the number of sub-states. My hunch is that you’ll see a difference between the two transmission models if you visualise the sizes of the exposed, infectious and terminally ill states.

storopoli · May 21, 2021, 11:59am

Great I’ll do that. I showed the LCT and Erlang Distributions and the whole idea to an epidemiologist and she got all excited. We are exploring this using Liverpool’s model with Brazil’s data.

It makes sense. I’ll explore how the sub-states are populated using either a fixed beta or a varying weekly beta. Thanks!

Funko_Unko · May 22, 2021, 7:24am

So I’ve got another, slightly related question:

In principle, the Liverpool model could have been run daily from scratch. Was this done? Or were you able to exploit information from previous runs?

Maybe @s.maskell can tell me something about that? (Or tag someone that can).

Thanks!

s.maskell · May 22, 2021, 8:30am

@Funko_Unko: We’ve been running the model on each of 100 (ish) CCGs every day since mid-2020 (modulo server downtime etc). If we do that by re-fitting the model every day, we waste a lot of computation. @alphillips has therefore written a little adjunct to Stan that uses importance sampling to enable the samples for one day to be used for another. This saves us some computation and enables us to run more fits on a fixed computational resource. We could put that somewhere (eg on the CODATMO github pages) if others would find it useful.

Funko_Unko · May 22, 2021, 8:50am

That would be wonderful, I’d like to check it out.

A CCG is some node on a computing cluster?

How much time gets saved? Does this approach then still require the occasional complete refit?

s.maskell · May 22, 2021, 9:15am

@Funko_Unko: sure we can share the code. @alphillips: can you pop what you’ve got on a new folder on CODATMO?

Sorry, a CCG is a region of the UK as perceived by people who do healthcare: Clinical commissioning group - Wikipedia. We run each CCG separately and use lots of cores, so saying that a CCG is a core is, by chance, a relatively accurate description!

Currently, we can refit while the parameterisation is the same. Since we consider piecewise linear segments and the pieces are a week long, so we refit for 6 days out of 7. The importance sampling is near-instantaneous so we get a 7x average speed-up. We could over-parameterise (ie include next week’s parameter in the fit for today) but we assumed (we’ve not checked) that the prior would be such that Neff would become terrible.

Our longer term plan is to get away from fitting ODEs and use SDEs with proper streaming algorithms (see here: StanCon 2020. Talk 5: Simon Maskell. Towards an Interface for Streaming Stan - YouTube), but that’s not quite ready to use in anger yet.

alphillips · May 23, 2021, 12:23pm

@Funko_Unko, R package here: GitHub - codatmo/stanIncrementalImportanceSampling: Basic R package to perform importance sampling to update fitted Stan models with new data, consisting of a single function importance_sampling that takes as input: the original stanfit object; the original data as a list; the original data plus the new data as a list. It then outputs the importance weights for each iteration.

You need to predict a week into the future for example so that the parameters for those days are there in the fitted model to be reweighted with the new data when it’s available.

Funko_Unko · May 23, 2021, 2:55pm

That’s great, thanks both of you, I’ll have a look.

breckbaldwin · May 28, 2021, 12:43pm

If there is interest I’d like to hook up some data generating process COVID simulations to this effort. What is the current model in play that I can run on a laptop?

I think recovering some known generating parameters could shed some light on the work. I am particularly interested in the integration of social media data into the model.

Page on what I have done, I have a busted connection to the early version Liverpool model.

https://codatmo.github.io/dataGeneratingProcess1/

storopoli · May 28, 2021, 6:36pm

I am also interested in integration of social media data into the model. CoDatMo’s Liverpool NHS 111 calls models from my inspection do the opposite. They somehow predict calls from the infections in the model. See lines 209-221 here.

calls_111_lagged_daily_infections

Where I think we should do the something different. Use the daily twitter symptom mentions to help predict infected (or even dead maybe).

Glad to hear opinions and thoughts about this.

breckbaldwin · May 30, 2021, 11:59pm

I have retreated to the simple SIR model (Model reproduction checklist)–I find it useful just because I can return to the case study writeup (linked above) if I get confused.

I think an appropriate way to add tweets is to just use them to predict infected (this is with simulated data) along side the ODE contribution based negative binomial. Model and generated quantities is (running code is test.R at https://github.com/codatmo/dataGeneratingProcess1.git on the tweetSimpleSIR branch:


model {
  //priors
  beta ~ normal(2, 1);
  gamma ~ normal(0.4, 0.5);
  phi_inv ~ exponential(5);

  a_twitter ~ normal(0,1);
  beta_twitter ~ normal(0,.5);
  sigma_twitter ~ normal(0,1);
  if (compute_likelihood == 1) {
    for (i in 1:n_days) {
      if (run_SIR == 1 && run_twitter == 1) {
       real tweet_val = a_twitter + symptomaticTweets[i] * beta_twitter;
       cases[i] ~ neg_binomial_2(infected_daily_counts[i] + tweet_val, phi);
        
      }
      else if (run_SIR == 1) {
        cases[i] ~ neg_binomial_2(infected_daily_counts[i], phi);
      }
      else if (run_twitter == 1) {
        cases[i] ~ normal(a_twitter + symptomaticTweets[i] * beta_twitter, 
                          sigma_twitter);
      }
    }
  }
}
generated quantities {
 
 real pred_cases[n_days];
 for (i in 1:n_days) {
   if (run_twitter == 1 && run_SIR ==1) {
      real tweet_val = a_twitter + symptomaticTweets[i] * beta_twitter;
      pred_cases[i] = neg_binomial_2_rng(infected_daily_counts[i] + tweet_val, phi);
   }
   else if (run_twitter == 1) {
     pred_cases[i] = 
          normal_rng(a_twitter + symptomaticTweets[i] * beta_twitter, sigma_twitter);
    }
    else if (run_SIR == 1) {
      pred_cases[i] = neg_binomial_2_rng(infected_daily_counts[i], phi);
    }

 }
}

Am I doing this right? The relevant case is the `run_twitter == 1 && run_SIR ==1’ if statements in the model and generated quantities.

Fits somewhat:

Thanks

Breck

storopoli · May 31, 2021, 4:27pm

@breckbaldwin, we need to add a latent lag also.

I don’t think it is interesting to use tweets to predict infected, we don’t have any real data or estimate related to infected besides the I compartments in the ODE system. We should use tweets to predict deaths D that we have quite reliable data (actually the only real-world data we condition the model are daily_deaths and daily_tweets_mentions).

Maybe something like:

parameters {
  ...
  real<lower=0> lag_tweets_deaths;
  ...
}
...
model {
  ...
  lag_tweets_deaths ~ lognormal(log(15), log(10)); // I don't know which prior to use
  ...
  for (i in 1:n_days) {
    deaths[i] ~ neg_binomial_2(deaths[i - lag_tweets_deaths], phi_deaths);
  }
}

I think that is my main idea, I don’t know how to specifically code this thing up but that is what I was planning to do with daily twitter symptoms.

breckbaldwin · May 31, 2021, 6:35pm

The sources of info we have are:

tweets + date
deaths + date
no hospital admission info or testing info

So simplest model is just lagged tweets predicting deaths. Obviously too simple but I’ll try and code it up with simulation backing it.

breckbaldwin · May 31, 2021, 11:46pm

@storopoli Won’t work, can’t index with a real which lag_tweets_deaths is.
We could do a floor, round. Somehow seems wrong.
Kicking over to slack for now.

Topic		Replies	Views
Adjoint ODE behaviour CmdStan fitting-issues , ode , covid-19	56	3502	November 10, 2021
Adjoint ODE Prototype - RFC - Please Test General ode	41	2756	April 7, 2021
CmdStan 2.27.0 release candidate General	36	2163	June 10, 2021
ODE solver errors & model parameterisations Modeling rstan , fitting-issues , ode , covid-19 , reparametrization	3	878	March 14, 2022
Fitting a Simple SIR model with parameters that depend on time Modeling rstan , epidemiology , covid-19	12	1302	July 10, 2020

CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver)

Related topics