Help specifying nested random effects structure

Hi there! I’d like some help figuring out how to specify my random effects structure. I have 10 sites, each with 3 traps on them, so I’d like the random effect of each trap nested within each site. I’m just a little confused how to nest them, and any help would be appreciated.
Below is what I’m currently working with. WHen I try to run it, it begins to run, but seems unable to ever actually do the sampling - it just gets stuck at 0% for all chains. Thanks!


data{
  int o[156];
  int RETrap[156];
  int RESite[156];
  int s[156];
  vector[156] d;
}
parameters{
  vector[30] b_0_rt; // trap level random effect
  real <lower=0> sigma_trap;
  vector[10] b_0_rs; // site level random effect
  real<lower=0> sigma_site;
  real <lower=0> b_1_e;
  real b_1_m;
}
model{
  vector[156] m;
  vector[156] e;
  vector[156] lambda;
  b_1_m ~ normal( 0 , 2.5 );
  sigma_site ~ exponential(1);
  sigma_trap ~ exponential(1);
  b_0_rs ~ normal(0, sigma_site); // prior for site level random effect
   // Prior for site-level random effects
  for (i in 1:156) {
    b_0_rt[RETrap[i]] ~ normal(b_0_rs[RESite[i]], sigma_trap);   // Trap's random effect is nested within site-level random effect
  }
  for ( i in 1:156 ) {
    lambda[i] = exp(b_1_m*s[i] + b_0_rt[RETrap[i]]));
  }
  b_1_e ~ lognormal(0.01,1);
  for ( i in 1:156 ) {
    e[i] = b_1_e * d[i] ;
    e[i] = inv_logit(e[i]);
  }
  for ( i in 1:156 ) {
    m[i] = e[i] * lambda[i];
  }
  o ~ neg_binomial_2( m, phi );
}

[edit: escaped Stan code]

Hi, @cbarbs and welcome to the Stan forums.

What you are doing with the nesting is correct. You have to be careful about introducing too many intercepts. The trick is realizing that you can code the same nested model two ways, either using the higher level model as the location of the lower-level model or pulling everything out into a single intercept. If you have lots of nested effects, you really need to do it the latter way, which is what you’ve done.

The relation is that

\qquad \alpha_i \sim \text{normal}(0, 5), \beta_{i,j} \sim \text{normal}(\alpha_i, \sigma) with predictor \beta_{i, j}

is equivalent to

\qquad \alpha_i \sim \text{normal}(0, 5), \gamma_{i,j} \sim \text{normal}(0, \sigma) with predictor \alpha_i + \gamma_{i, j}.

The relation will yield \gamma_{i, j} = \beta_{i,j} - \alpha_i. The predictor is the same in both cases.

There are a couple of issues with your model.

  1. You almost certainly are going to want a non-centered parameterization of the hierarchical model. That means, reordering the declarations and adding a multiplier.
      real<lower=0> sigma_trap;
      vector<multiplier=sigma_trap>[30] b_0_rt;
      real<lower=0> sigma_site;
      vector<multiplier=sigma_site>[10] b_0_rs;
  1. Negative binomial is hard to fit because there are two ways to adjust for large values—incraease the mean or increase the over dispersion. I would suggest trying to fit it with just a Poisson to begin with, then when you have that working, expand to negative binomial.

You can also vectorize all the loops in your model which will save a bunch of lines of code, and in most cases will be faster because autodiff is faster with vectorization. For example, rather than

for (i in 1:156) {
    b_0_rt[RETrap[i]] ~ normal(b_0_rs[RESite[i]], sigma_trap);   // Trap's random effect is nested within site-level random effect
  }

you can use

b_0_rt[RETrap] ~ normal(b_0_rs[RESite], sigma_trap);

This will give you a big speedup because Stan is clever enough to only compute log(sigma_trap) once, and that’s the most expensive calculation per observation.

Similarly, rather than looping to define lambda you can vectorize it all, which makes it natural to move the declaration down to where the variable is defined. This should also be faster, but not to the extent of the last one as there’s no non-linear operations saved, just a slightly more compact autodiff representation.

vector[156] lambda = exp(b_1_m * s + b_0_rt[RETrap]);

Similarly, you can define e and m in one-liners, too.

vector[156] e = inv_logit(b_1_e * d);
vector[156] m = e .* lambda;  // not the `.*` for element wise multiplication

Thanks a lot for your reply, Bob! I will try to vectorize things to speed things up - that does appear to be the problem. Would you mind taking a look at what I have so far to see if it’s right? I had simplified the model earlier and now there is one more variable in there, hraw, and there is a generated quantities section. I have vectorized the things in there, too, but I’m not sure if that was right. Thank you again!

data{
  int o[156];
  int RETrap[156];
  int RESite[156];
  int s[156];
  int hraw[156];
  vector[156] d;
}
parameters{
  real<lower=0> sigma_trap;
  vector<multiplier=sigma_trap>[30] b_0_rt;
  real<lower=0> sigma_site;
  vector<multiplier=sigma_site>[10] b_0_rs;
  real <lower=0> b_1_e;
  real b_1_m;
}
model{
  b_1_m ~ normal( 0 , 2.5 );
  sigma_site ~ exponential(1);
  sigma_trap ~ exponential(1);
  b_0_rs ~ normal(0, sigma_site); // prior for site level random effect
   // Prior for site-level random effects
  b_0_rt[RETrap] ~ normal(b_0_rs[RESite], sigma_trap);
  vector[156] lambda = hraw*(exp(b_1_m * s + b_0_rt[RETrap]));
  b_1_e ~ lognormal(0.01,1);
  vector[156] e = inv_logit(b_1_e * d);
  vector[156] m = e .* (lambda/hraw);
  o ~ neg_binomial_2( m, phi );
}
generated quantities{
  vector[156] lambda = hraw*(exp(b_1_m * s + b_0_rt[RETrap])); 
  vector[156] e = inv_logit(b_1_e * d);
  vector[156] m = e .* (lambda/hraw);
  vector[156] log_lik = neg_binomial_2_lpmf( o | m, phi );
}

[edit: escaped Stan code]

By using “```stan” to open and “```” to close on their own lines, you can escape Stan code.

Yes, those are the right fixes to create the non-centered parameterization.

I would suggest moving the generated quantities lambda, e and m to transformed parameters, then reusing them in the model.

log_lik should be defined in a loop—as is, it’s going to calculate the total log likelihood and assign it to each entry of the vector (or it’ll fail to compile—I don’t know offhand how far we’ve extended assignment).

We have a neg_binomial_2_log that takes its parameter on the log scale. That way, you could define log_m instead of m and save some round tripping. You generally want to avoid exp() wherever possible as it’s very unstable (arguments outside of (-400, 400) underflow or overflow).

vector[156] log_lambda = log(raw) + b_1_m * s + b_0_rt[RETrap];
vector[156] log_m = log(e) + log_lambda - log(raw);
...
... neg_binomial_log_2_lpmf(o | log_m, phi);

You can also scale b_1_m to match its prior as

real<multiplier=2.5> b_1_m;

but that’s much less critical.

Why is it lognormal(0.01, 1) rather than lognormal(0, 1)? The first argument is the log mean, so it’s unconstrained and can even be negative.