Hierarchical Gompertz model

Ok the model now:

orm_mult <- bf(cum_cases ~ A * exp( -exp( -(k * (day - delay) ) ) ),
             A ~ 1 + (1 | Country.Region),
             k ~ 1 + (1 | Country.Region),
             delay ~ 1 + (1 | Country.Region),
             nl = TRUE)

mult_priors <- c(
  prior(lognormal(0, 1), nlpar = "A", lb=0),
  prior(lognormal(0, 1), nlpar = "k", lb=0),
  prior(lognormal(0, 1), nlpar = "delay", lb=0),
  prior(student_t(3, 0, 1), class = "sigma"), 
  prior(student_t(4, 0, 1), class = "sd", group = "Country.Region", nlpar = "A"),
  prior(student_t(4, 0, 1), class = "sd", group = "Country.Region", nlpar = "k"),
  prior(student_t(4, 0, 1), class = "sd", group = "Country.Region", nlpar = "delay"))

modmult <- brm(form_mult, data = case_counts,
           prior = mult_priors, seed = 1234,
           family = lognormal(link = "identity", link_sigma = "log"),
           chains = 4, cores=4, sample_prior = "no")

A prior predictive check (15 divergences occurred):

Running the model and evaluating the likelihood I got some warnings:

Warning messages:
1: There were 242 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 2927 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
3: Examine the pairs() plot to diagnose sampling problems
 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

Summary of results:

> summary(modmult)
 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: cum_cases ~ A * exp(-exp(-(k * (day - delay)))) 
         A ~ 1 + (1 | Country.Region)
         k ~ 1 + (1 | Country.Region)
         delay ~ 1 + (1 | Country.Region)
   Data: case_counts (Number of observations: 256) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~Country.Region (Number of levels: 5) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(A_Intercept)         8.66      6.54     2.06    26.71 1.01      238      276
sd(k_Intercept)         0.09      0.05     0.03     0.22 1.01      436      375
sd(delay_Intercept)    31.32     11.33    15.24    61.19 1.00      349      359

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
A_Intercept        12.42      5.71     0.55    22.20 1.01      220      317
k_Intercept         0.09      0.04     0.04     0.18 1.01      467      508
delay_Intercept     4.09      6.42     0.17    25.06 1.01      487      278

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.58      0.03     0.53     0.64 1.00      735      686

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 242 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Posterior conditonal plot:

Ok so plot looks better (except for South Korea) but there are divergences 🤔. Anyone any thoughts about that?

You can try and decrease the step size by increasing adapt_delta, but I have no idea how to do that with brms.

2 Likes

Hey! I’m also no brms expert.

However, I think conceptually, you’d want to do something like this:

bf(
    cum_cases ~ logA - exp(-(exp(logk) * (day_norm - exp(logdelay)))),
    logA ~ 1 + (1 | ID | Country.Region),
    logk ~ 1 + (1 | ID | Country.Region),
    logdelay ~ 1 + (1 | ID | Country.Region),
    nl = TRUE
  )

In the model above there are 2 problems, I think:

  1. You’ve put a lognormal prior on the common mean of A, k, and delay. This ensures that the common mean is positive, however, the parameters could still be negative due to the random effect structure. To ensure positivity on the varying parameters, I think you have to wrap them in exp and define everything on the log scale.
  2. The lognormal “identity” link is essentially a log-link. So i think you do have to take the log of the mean formula if you’re using family = lognormal().
case_counts$Country.Region <- as.factor(case_counts$Country.Region)
case_counts$day_norm <- case_counts$day/max(case_counts$day)

form_mult2 <-
  bf(
    cum_cases ~ logA - exp(-(exp(logk) * (day_norm - exp(logdelay)))),
    logA ~ 1 + (1 | ID | Country.Region),
    logk ~ 1 + (1 | ID | Country.Region),
    logdelay ~ 1 + (1 | ID | Country.Region),
    nl = TRUE
  )
mult_priors2 <- c(
  prior(normal(0, 1), nlpar = "logA", lb = 0),
  prior(normal(0, 0.5), nlpar = "logk", lb = 0),
  prior(normal(0, 0.5), nlpar = "logdelay", lb = 0),
  prior(normal(0, 1), class = "sigma"),
  prior(
    normal(0, 2.5),
    class = "sd",
    group = "Country.Region",
    nlpar = "logA"
  ),
  prior(
    normal(0, 1),
    class = "sd",
    group = "Country.Region",
    nlpar = "logk"
  ),
  prior(
    normal(0, 1),
    class = "sd",
    group = "Country.Region",
    nlpar = "logdelay"
  )
)

modmult2 <- brm(
  form_mult2,
  data = case_counts,
  prior = mult_priors2,
  seed = 1234,
  family = lognormal(),
  chains = 4,
  cores = 4,
  sample_prior = "no",
  control = list(adapt_delta = 0.95, max_treedepth = 15)
)

conditions <- make_conditions(case_counts, "Country.Region")

plot(
  conditional_effects(
    modmult2, 
    conditions = conditions, 
    re_formula = NULL
    ),
  points = TRUE,
  facet_arg = list(scales = "free")
)

This runs with only a couple of divergences for me (around ~3 divergences; the model runs slooooooow, though). However, the fit seems not so different from what you have posted. The parameters are obviously fairly correlated – especially, logk and logdelay. I’m afraid the (unsatisfactory) answer here could be that the folk theorem of statistical computing comes into play here: the model is probably not so great… :/ But maybe I’m missing something…

2 Likes

Hi Max. Thanks for that. Funnily enough I realised overnight that the lognormal priors were a problem and I had changed them to normal priors and the model was behaving better, but I didn’t quite eliminate all the divergences.

About the formula changes 🤔 - interesting. I’ll have to think on it and have other stuff to do today but I’ll maybe try it tonight. Thanks all!

1 Like

I was thinking about this and trying a few different things including alternate parameterisations in the paper I’m following. I’ve come around to thinking you are correct here. I was actually wondering if there might be an identifiability problem here no matter what we do with distributions or parameterisations ? After all, the right hand side of the equation has 3 parameters to estimate but only one explanatory variable.

After some discussion on a similar thread ( Multilevel model using >16GB of RAM ) and remembering and re-reading this blog - particularly the sigmoid example: https://www.martinmodrak.cz/2018/05/14/identifying-non-identifiability/ I think that yes there is an identifiability problem in the model. There will always be some values of day - delay that will result in this term = 0 meaning for those values k can have any value and A can have any value.

Unless there is some way to constrain (day - delay) ! = 0 ???

May I ask, why you did not specify b in:

image

See its effect:

image

Attention b > 0 look in the comments in wikipedia.

1 Like

I’m working off this paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0178691#sec005

There are different parameterisations. I’ve been using equation 1 in the paper, however although I didnt’ post results here, I also tried equations 2 and 3 - with equation 3 being the same as the wikipedia equation you just posted. I had similar issues with all 3 versions. In the version that you posted the identifiability issue I was just talking about would happen if parameter c took the value of zero. In that case t has no influence in the model and both a and b are not effected by the data leading to the identifiability wackyness. You could define c > 0, except that there is no reason for c to be only positive. So how can you allow c to be positive or negative but not exactly zero ?

The probability that something is exactly 0 is 0. Some proof in measurement theory, I remember.
In reality you deal with floating point, but the probability to hit 0 is almost 0 too, except you are going to use a prior and force it to produce lots of 0. That’s then because of floating point precision.
You can write
if(c==0.0) reject("c is 0");
My 2 cents, I would ignore it. And add the above statement.

Why don’t you write a minimal Stan program by yourself without BRMS and then we can analyse
much more easily what’s going wrong.

1 Like

Except that in a hierarchical model the number of parameters that can hit zero or almost zero is greatly increased.

The problem is well described in martinmodraks blog above and I’ve seen the same patterns in pairs plots from my data. I’m happy this is the problem I just don’t know what to do about it.

I’m sorry, I never used BRMS, although having good knowledge of Stan. I originally came from WinBUGS, OpenBUGS and then JAGS to Stan, so it would cost me days to get firm with BRMS. So my suggestion is to write a simply pure Stan model and then we will see.

2 Likes

Yeah sorry I’m trying to balance alot right now, this is very much not the day job and I find it very difficult/slow to work with pure stan models directly.

I’ll try to make some time to do as you suggest

Just use a very simply model with one country. Later we will fill in the gaps, easily. If you stuck post the code, we will guide you through.

2 Likes

Ok made some time for that. BEHOLD!!! The weirdness below.

Files:
gomp_simple.stan (599 Bytes)
gomp_stan.R (891 Bytes)

The simplified model - I’m using the model you found on wiki with normal priors for the kinetics paramters and s student_t on sigma and a normal_lpdf

data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // dependent variable
  vector[N] X; // independent variable
}

parameters {
  real A;  // population-level effects
  real k;  // population-level effects
  real c;  // population-level effects
  real <lower=0> sigma;  // residual SD
}

model {

  vector[N] mu;
  //for (n in 1:N) {
    // compute non-linear predictor values
    mu = A * exp( - exp( -c * ( k * X)));
  //}
  // priors
  A ~ normal(0, 10000);
  k ~ normal(0, 20);
  c ~ normal(0, 20);
  sigma ~ student_t(3, 0, 1000);

  // likelihood
  Y ~ normal(mu, sigma);

}

Check out the pairs plot:

😱

I also did the 3d plot from Martin Madraks blog:

Anyway - as I was saying before, the plots reveal the problem is when c or k go to zero - the remaining parameters can take any value!

data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // dependent variable
  vector[N] X; // independent variable
}

parameters {
  real A;
  real<lower=0> k;
  real<lower=0> T;
  real <lower=0> sigma;  // residual SD
}

transformed parameters {
  vector[N] mu =  A * exp(-exp(-k * (X - T)));
}

model {
  // priors
  A ~ normal(0, 10000);
  k ~ normal(0, 20);
  T ~ normal(0, 20);
  sigma ~ student_t(3, 0, 1000);

  // likelihood
  Y ~ normal(mu, sigma);
}

R code:

library(tidyverse)
library(brms)
library(bayesplot)
library(rstan)

case_counts <- read_csv("corona_counts.csv")
case_counts$Country.Region <- as.factor(case_counts$Country.Region)

fname <- "gompertz_formula1"
rstan_options(auto_write = TRUE)
options(mc.cores = 4)
model_file <- paste(fname,".stan",sep="")
smod <- stan_model(file=model_file)


seldata <- case_counts %>% filter(Country.Region == "China")
N <- nrow(seldata)
Y <- seldata$cum_cases
X <- seldata$day

sdat <- list(
  N = N
, Y = Y
, X = X
)


stan_sample <- sampling(smod
, data = sdat
#, pars = pars
#, include = FALSE
, iter=1000
, chains=1
#, init =  init_ll
, seed = 1234
, control = list(
  adapt_delta = 0.80
, max_treedepth = 14
)
)

extr <- extract(stan_sample)
plot(X,apply(extr$mu,2,mean))

According to Wikipedia (https://en.wikipedia.org/wiki/Gompertz_function),
*b, c* are positive numbers
this means
k > 0 (growth rate) and T > 0 (delay). Although Stan fits the right model, even if you don’t
constrain these.

And if we plot:
china

Is this ok for you? Then we need to add the group levels effects. Also we need to address what we want to show and we assume. The growth rate should be the same or is the virus mutating does it alter the growth rate?

gompertz_formula1.stan (474 Bytes) gompertz_formula1.R (767 Bytes)

2 Likes

I feel we are repeating things here. My original model did have the parameters resticted >0. However, as @Max_Mantei pointed out these is no principled reason these variables cannot be <0. I’d wager the reason Wikipedia says they are positive is because frequentist models simply die if you dont’ restrict them to be +ve.

Leaving that aside - restricting T > 0 in your model does not avoid the problem because (X - T) can still be =0.

Did you make pairs plots for your results? I’d be curious to see that. Just because you get a reasonable looking fit doesn’t mean there isn’t a model - many of my attempts have had reasonable looking fits for a single country but the problem was lurking.

Ok, I’m going to stick my neck out here:

I find it very unlikely that the parameters can just be anything. The restrictions are needed in order for the curve to make sense.

That happens with probability zero, no?

2 Likes

I don’t see a principled reason that k cannot be lower than zero !?

No. X is a positive integer so if T is positive number if it can definitely happen that (X - t) =0 or very very close to 0.

pairsplot

I am refering now to:
A * exp(-exp(-k * (X - T)))

and not the wikipedia version, where restriction makes sense or not, I don’t won’t this discuss yet.

if k < 0 the curve is decreasing, but ok.
And the pairs plot shows the unconstrained parameters. Do you see any pathologies?

2 Likes

I didn’t say this! I’ve put the exp in the equation, so that the mean and random effects can both vary freely on the log scale. k and delay are still strictly positive in this specification. Sorry, if that was unclear.

3 Likes