How should I re-parameterize my model (or pick better priors)

I collected cells from 28 mice that received 6 different treatments (3-5 per group).
The number of cells I collected is highly variable across treatments; one group only has a couple hundred per animal, and some have a few million.

I’m trying to build a multi-level model with a random effect for each animal.

So, something like:
Cells \sim Poisson (\lambda)
log(\lambda) = mu0 + bT[Treatment] + bM[Mice]

The thing is, I cannot get the model to converge.

My model looks like

data{
    int Cells[28];
    int Treatment[28];
    int Mice[28];
}
parameters{
    real<lower=0> mu0;
    vector[28] bM;
    vector[6] bT;
}
model{
    vector[28] mu;
    bT ~ normal( 0 , 6 );
    bM ~ normal( 0 , 2 );
    mu0 ~ gamma( 10/1 , 1/1 );
    
    for ( i in 1:28 ) {
        mu[i] = mu0 + bM[Mice[i]] + bT[Treatment[i]];
        mu[i] = exp(mu[i]);
    }
    Cells ~ poisson( mu );
}

Data from dput

structure(list(Cells = c(6227606, 5844779, 1338474, 10365562, 
6349543, 9831545, 10968084, 15513791, 2564861, 6725653, 1609213, 
716699, 991468, 822666, 521778, 11940, 345215, 42238, 7612, 10746, 
448, 448, 149, 597, 149, 14477, 1492, 7910), Mice = 1:28, Treatment = c(1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 4, 4, 
4, 4, 4, 3, 3, 3)), row.names = c(NA, -28L), class = c("tbl_df", 
"tbl", "data.frame"))

I think my priors are justifiable.

We typically expect to collect 0.5-20 million cells per mouse, so I think mu0~gamma(10, 1) is reasonable.

I think I am having a hard time parameterizing bT. The distribution is clearly not normal. Some treatment does nothing while some treatment kills almost every cell. I’ve tried to use bT~cauchy(0, 2) for heavier tails but to no avail.

Looking for suggestions to help me (1) reparameterize my model completely and/or (2) pick better priors.

I am a biologist by training, so please also let me know if I’m doing anything that is obviously incorrect.

Many thanks in advance.

Run everything with

"
data{
  int Cells[28];
  int Treatment[28];
  int Mice[28];
}
parameters{
  real<lower=0> mu0;
  vector[28] bM;
  vector[6] bT;
}
model{
  vector[28] mu;
  bT ~ normal( 0 , 6 );
  bM ~ normal( 0 , 2 );
  mu0 ~ gamma( 10/1 , 1/1 );
  
  for ( i in 1:28 ) {
    mu[i] = mu0 + bM[Mice[i]] + bT[Treatment[i]];
    mu[i] = exp(mu[i]);
  }
  Cells ~ poisson( mu );
}" -> stan_code

structure(list(Cells = c(6227606, 5844779, 1338474, 10365562, 
                         6349543, 9831545, 10968084, 15513791, 2564861, 6725653, 1609213, 
                         716699, 991468, 822666, 521778, 11940, 345215, 42238, 7612, 10746, 
                         448, 448, 149, 597, 149, 14477, 1492, 7910), Mice = 1:28, Treatment = c(1, 
                                                                                                 1, 1, 1, 1, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 4, 4, 
                                                                                                 4, 4, 4, 3, 3, 3)), row.names = c(NA, -28L), class = c("tbl_df", 
                                                                                                                                                        "tbl", "data.frame")) -> dd

rstan::stan(model_code = stan_code, data = list(Cells = dd$Cells, Mice = dd$Mice, Treatment = dd$Treatment))

You may want to perform some prior predictive checks. This will tell you whether you’ve really coded your expectations correctly. For instance, if you have counts in the order of millions, an intercept with that gamma distribution doesn’t make sense to me. Running such a check can be done by removing the likelihood from the model (the Poisson part).

Check the documentation for the gamma_lpdf function to see whether you have correctly specified the parameter values in your model. In general, fhere are two different parameterisation of the gamma distribution (scale vs. rate) that are often mixed up.

1 Like

Hi @LucC , thank you so much for the insight.

Can you help me get a bit more specific here?

I’m wondering

  • Am I being too restrictive (when I make it wider it doesn’t help it converge)
  • Is it unconventional to specify a Poisson intercept with Gamma
  • Is there some kind of standard regularization method that I should be using to make them fit

Re; Prior predictive check
When I check my prior with

n <- 1e5
lambda <- rgamma(n, shape = 10, rate = 1) + rnorm(n, 0, 6) + rnorm(n, 0, 2)
quantile(rpois(n, exp(lambda)), probs = seq(0, 1, len = 11))

(I hope that was the equivalent)

It prints

          0%          10%          20%          30%          40%          50%          60%          70%          80%          90%         100% 
0.000000e+00 2.000000e+00 5.300000e+01 4.930000e+02 3.313200e+03 2.031250e+04 1.207372e+05 8.313967e+05 8.224632e+06 1.950079e+08 2.322671e+18 

Which covers the range of my data. What am I missing? I am aware that I should specify \beta instead of scale in stan.

Definitely unfamiliar with the field; appreciate the guidance a lot.

It looks as if you’re trying to predict 28 observations with 28 “mouse” effects and six treatment effects. With strong enough priors, you could estimate this. But I don’t think your model is a multilevel model in the technical sense in that the group prior is not being estimated. Consequently, you are shrinking towards zero but only by some pre-determined amount.

I suspect what you want instead is to define hyperpiors that allow the group effects to be fully shrunk to zero.

parameters{
real<lower = 0> sigma_m;
real<lower = 0> sigma_t;
... 
} 
model{
bT ~ normal( 0 , sigma_t);
bM ~ normal( 0 , sigma_m );
...
} 

That may not be sufficient to solve your problem, but it should help. You can then put priors on the hyperparameters to refine the amount of shrinkage.

2 Likes

I think the problem might stem from the fact that you have 28 observations and estimate 35 parameters. With one observation per mice, the mice specific effects are going to be hard to estimate. I would start with just estimating the treatment effects.

2 Likes

I think there’s an element of truth in all three of the answers so far from @LucC , @simonbrauer , and @stijn , but I think the full story here has one final twist.

If you remove the individual mouse-level effects (delete the bT[Treatment[i]] term), you still have problems in this model. Why is that? Well, the reason is because at that point the likelihood completely is determined by the six treatment means, but you still have seven parameters you’re estimating (an intercept plus six offsets).

If you delete the intercept parameter (and still work with the model that has no individual mouse effects), the problem goes away. Importantly (and this is @simonbrauer’s point), something else that often also works is to place a true hierarchical prior, with variance to be estimated from data, on the group effects. If we can shrink the variance in these effects to small enough values, then the mean will still be well identified. Unfortunately, it seems that in your case this doesn’t save you, probably because you have too few groups for the random effect distribution to be well identified.

Finally, I’ll note that fitting an observation-level random effect is a valid way to deal with overdispersion in Poisson regression, but it often requires a really big number of observations to be tractable.

4 Likes

Fully agreed with what’s been said above. But I think @jsocolar meant to say bM[Mice[i]] when he wrote:

2 Likes

yup, good catch!