Where did I go wrong with non-centered parameterization of a nonlinear hierarchical model?

I am building a nonlinear multilevel model fit to simulated data. My initial version had problems with divergent transitions, so I switched to non-centered parameterization, heavily leaning on this forum post for guidance: Centred vs. non-centred parametrisation with lognormal likelihood - #9 by bbbales2

The model now fails to initialize, and banging my head against this wall is no longer productive. Can anyone please take a look to see if there is an obvious place that I went wrong?

The model is meant to estimate the effects of harvester (q_a), date (q_d), and site (q_l) on catch. It takes the form of the catch equation:

C = E*q*N^\beta

The catchability term, q, is then broken down into additive random effects.

q_{total}=q_{mean}+q_{harvester}+q_{date}+q_{site}

I simulated the data in R with this code, where for each date (d), two randomly selected sites (l) were each harvested by three randomly selected harvesters (a).

library(dplyr)

set.seed(395)

catch.fun.err<-function(effort, q_mu, q_d, q_a, q_l, popDensity, beta, error){
  catch=(effort*(q_mu+q_d+q_a+q_l)*popDensity^beta)*error
  return(catch)
}

# 90 unique dates
D<-90
# 20 unique anglers
A<-20
# 10 unique lakes
L<-10

q_d<-rlnorm(D, -1, 0.5)
q_a<-rlnorm(A, -2, 0.2)
q_l<-rlnorm(L, -2, 0.01)

q_mu<-0.2

popDensity<-rlnorm(L, meanlog=2, sdlog=0.2)
beta<-0.25

# D, A, and L indexes and their q values
dates<-data.frame(D=seq(1:D), q_d=q_d)
anglers<-data.frame(A=seq(1:A), q_a=q_a)
lakes<-data.frame(L=seq(1:L), q_l=q_l, popDensity=popDensity)

# sample randomly with replacement from anglers and lakes for each date. then left_join approprate q values by A, D, and L selection

lakeSamples<-sample(lakes$L, 180, replace=TRUE)
anglerSamples<-sample(anglers$A, 540, replace=TRUE)


# 2 lakes visited each day, three anglers at each lake
fake.df.hier<-data.frame(D=rep(dates$D, each=6),
                         L=rep(lakeSamples, each=3),
                         A=anglerSamples)%>%
  dplyr::left_join(dates, by="D")%>%
  dplyr::left_join(anglers, by="A")%>%
  dplyr::left_join(lakes, by="L")%>%
  dplyr::mutate(beta=rep(0.25),
         q_mu=rep(0.2),
         effort=rlnorm(540, meanlog=1.25, sdlog=0.25),
         error=rlnorm(540, meanlog=0, sdlog=0.3),
         lmbCatch=floor(catch.fun.err(effort, q_mu, q_d, q_a, q_l, popDensity, beta, error)))

fake.data.hier<-list(N=nrow(fake.df.hier),
                     D=D,
                     L=L,
                     A=A,
                     DD=fake.df.hier$D,
                     LL=fake.df.hier$L,
                     AA=fake.df.hier$A,
                     effort=fake.df.hier$effort,
                     popDensity=fake.df.hier$popDensity,
                     lmbCatch=fake.df.hier$lmbCatch)

And this is the model that I fit to it:


data {
  // number of observations
  int<lower=1> N;
  // number of anglers
  int<lower=1> A;
  // number of unique dates
  int<lower=1> D;
  // number of lakes
  int<lower=1> L;
  
  // all observations of catch (response)
  array[N] int<lower=0> lmbCatch;
  
  // indexing observations by anglers, dates, and lakes 
  array[N] int<lower=1, upper=A> AA;
  array[N] int<lower=1, upper=D> DD;
  array[N] int<lower=1, upper=L> LL;

  // all observations of effort (covariate)
  array[N] real<lower=0> effort;
  // all estimates of population density (covariate)
  array[N] real<lower=0> popDensity;
  
}


// phi is inverse overdispersion param
parameters {
  // I want different estimates of q for different anglers, dates, and lakes, as well as an overall mean q
  // without these zero lower bounds, initialization fails

  array[A] real<lower=0> q_a_raw;
  array[D] real<lower=0> q_d_raw;
  array[L] real<lower=0> q_l_raw;
 
  real<lower=0> q_mu;

  real beta;
  real phi;
  
  // hyperparameters
  real mu_q_a;
  real<lower=0> sigma_q_a;
  
  real mu_q_d;
  real<lower=0> sigma_q_d;
  
  real mu_q_l;
  real<lower=0> sigma_q_l;
}

transformed parameters{

  array[N] real<lower=0> catchHat;
  array[N] real logCatchHat;
  
  array[A] real log_q_a;
  array[D] real log_q_d;
  array[L] real log_q_l;
  
  array[A] real<lower=0> q_a;
  array[D] real<lower=0> q_d;
  array[L] real<lower=0> q_l;
  
  for(i in 1:N){
    catchHat[i] = effort[i] .* (q_mu + q_a[AA[i]] + q_d[DD[i]] + q_l[LL[i]]) .* popDensity[i]^beta;
  }
  
  logCatchHat = log(catchHat);
  
  log_q_a = log(q_a);
  log_q_d = log(q_d);
  log_q_l = log(q_l);
  
  for(a in 1:A){
    log_q_a[A] = mu_q_a + sigma_q_a * q_a_raw[A];
  }
  for(d in 1:D){
    log_q_d[D] = mu_q_d + sigma_q_d * q_d_raw[D];
  }
  for(l in 1:L){
    log_q_l[L] = mu_q_l + sigma_q_l * q_l_raw[L];
  }

}

model {
  
  lmbCatch~neg_binomial_2_log(logCatchHat, phi);
  
  // prior on logCatchHat. Not sure if I need this
  logCatchHat~lognormal(0,1);
  
  q_a_raw~std_normal();
  q_d_raw~std_normal();
  q_l_raw~std_normal();
  
  // these need to be above zero
  q_mu~lognormal(0,0.5);
  
  // these need to be able to include negatives
  mu_q_a~normal(0,1);
  mu_q_d~normal(0,1);
  mu_q_l~normal(0,1);
  
  sigma_q_a~lognormal(0,1);
  sigma_q_d~lognormal(0,1);
  sigma_q_l~lognormal(0,1);
  

  phi~inv_gamma(0.4, 0.3);
  beta~lognormal(0,0.5);
  
  
}


You have two problematic pieces:

In a traditional glm, your linear predictor would be on the link scale (i.e. the log scale in this model) but you construct the linear predictor on the identity scale and then take the logarithm. This is problematic because this logarithm is undefined whenever your linear predictor is negative.

Also, in a traditional glm, you would not put an explicit prior statement directly over logCatchHat. Instead, your implied prior on logCatchHat would be defined by the data and the priors on the coefficents that go into computing the linear predictor.

Moreover, in choosing to include an extra contribution to the prior density here, you have selected a distribution whose support is strictly positive, even though I’m sure that you intend to allow logCatchHat to be negative as well.

Thus, you will fail to initialize not only when the linear predictor is less than zero (so you cannot take a logarithm), but also anytime the linear predictor’s logarithm is less than zero (because as written logCatchHat~lognormal(0,1); says that you expect log(log(CatchHat)) to be Gaussian.

1 Like

Thank you for your reply!

I have removed the inappropriate prior on logCatchHat, and I linearized the catch equation (for logCatchHat) to put it on the log scale. Initialization is still failing, and I wonder if I’ve made poor choices about prior distributions. (for sigma_a, d, and l, I’ve tried exponential and half-normal priors with no luck. I’ve seen student-t recommended, but I haven’t figured out how to implement it.)

data {
  // number of observations
  int<lower=1> N;
  // number of anglers
  int<lower=1> A;
  // number of unique dates
  int<lower=1> D;
  // number of lakes
  int<lower=1> L;
  
  // all observations of catch (response)
  array[N] int<lower=0> lmbCatch;
  
  // indexing observations by anglers, dates, and lakes 
  array[N] int<lower=1, upper=A> AA;
  array[N] int<lower=1, upper=D> DD;
  array[N] int<lower=1, upper=L> LL;

  // all observations of effort (covariate)
  array[N] real<lower=0> effort;
  // all estimates of population density (covariate)
  array[N] real<lower=0> popDensity;
  
}

transformed data{
  array[N] real log_effort;
  array[N] real log_popDensity;

  log_effort=log(effort);
  log_popDensity=log(popDensity);
}

parameters {
  // I want different estimates of q for different anglers, dates, and lakes, as well as an overall mean q
  // without these zero lower bounds, initialization fails

  array[A] real<lower=0> q_a_raw;
  array[D] real<lower=0> q_d_raw;
  array[L] real<lower=0> q_l_raw;
 
  real log_q_mu;

  real beta;
  real phi;
  
  // hyperparameters
  real mu_q_a;
  real<lower=0> sigma_q_a;
  
  real mu_q_d;
  real<lower=0> sigma_q_d;
  
  real mu_q_l;
  real<lower=0> sigma_q_l;
}

transformed parameters{

  array[N] real logCatchHat;
  
  array[A] real log_q_a;
  array[D] real log_q_d;
  array[L] real log_q_l;
    
  for(i in 1:N){
    logCatchHat[i]=log_effort[i] + log_q_mu + log_q_a[AA[i]] + log_q_d[DD[i]] + log_q_l[LL[i]] + beta*log_popDensity[i];
  }

  
  for(a in 1:A){
    log_q_a[A] = mu_q_a + sigma_q_a * q_a_raw[A];
  }
  for(d in 1:D){
    log_q_d[D] = mu_q_d + sigma_q_d * q_d_raw[D];
  }
  for(l in 1:L){
    log_q_l[L] = mu_q_l + sigma_q_l * q_l_raw[L];
  }

}

model {
  
  lmbCatch~neg_binomial_2_log(logCatchHat, phi);
  
  q_a_raw~std_normal();
  q_d_raw~std_normal();
  q_l_raw~std_normal();
  
  // these need to be above zero
  log_q_mu~normal(0,1);
  
  // these need to be able to include negatives
  mu_q_a~normal(0,1);
  mu_q_d~normal(0,1);
  mu_q_l~normal(0,1);
  
  sigma_q_a~lognormal(0,1);
  sigma_q_d~lognormal(0,1);
  sigma_q_l~lognormal(0,1);
  

  phi~inv_gamma(0.4, 0.3);
  beta~lognormal(0,0.5);
  
  
}


In the transformed parameters block, you are using transformed parameters like log_q_a before assigning them their values. These statements execute in order of appearance (not in the order of some inferred DAG, as in JAGS), so the order really matters. What’s happening here is that log_q_a and similar parameters are getting initialized as NA (or something like that) when they are declared. Then logCatchHat is getting computed as NA. Then log_q_a is getting assigned a meaningful value, but this value is never actually being used for anything.

Thanks, that’s good to know. I’ve corrected that error (and a few others) and continue to have problems with initialization. So far I have tried:

  • narrower priors, wider priors
  • providing reasonable initial values
  • target+= notation instead of sampling statements
  • a few different approaches (i.e. linearized vs nonlinear catch equation) to noncentered parameterization
  • removing lower bounds on q_a, q_d, and q_l, switching q_mu to normal distribution
  • all of these, but using centered parameterization. These models initialize, but continue to have problems with divergent transitions.

Thanks for helping a beginning Stan fan.


data {
  // number of observations
  int<lower=1> N;
  // number of anglers
  int<lower=1> A;
  // number of unique dates
  int<lower=1> D;
  // number of lakes
  int<lower=1> L;
  
  // all observations of catch (response)
  array[N] int<lower=0> lmbCatch;
  
  // indexing observations by anglers, dates, and lakes 
  array[N] int<lower=1, upper=A> AA;
  array[N] int<lower=1, upper=D> DD;
  array[N] int<lower=1, upper=L> LL;

  // all observations of effort (covariate)
  array[N] real log_effort;
  // all estimates of population density (covariate)
  array[N] real log_popDensity;
  
}


parameters {
  // I want different estimates of q for different anglers, dates, and lakes (partially pooled), as well as an overall mean q


  real log_q_mu;
  real<lower=0> beta;
  real<lower=0> phi;
  
  // hyperparameters
  real mu_q_a;
  real<lower=0> sigma_q_a;
  
  real mu_q_d;
  real<lower=0> sigma_q_d;
  
  real mu_q_l;
  real<lower=0> sigma_q_l;
}

transformed parameters{

  array[A] real q_a_raw;
  array[D] real q_d_raw;
  array[L] real q_l_raw;

  array[N] real logCatchHat;
  
  array[A] real log_q_a;
  array[D] real log_q_d;
  array[L] real log_q_l;
  

  for(a in 1:A){
    log_q_a[a] = mu_q_a + sigma_q_a * q_a_raw[a];
  }
  for(d in 1:D){
    log_q_d[d] = mu_q_d + sigma_q_d * q_d_raw[d];
  }
  for(l in 1:L){
    log_q_l[l] = mu_q_l + sigma_q_l * q_l_raw[l];
  }

for(i in 1:N){
  logCatchHat[i] = log_effort[i] * log_q_mu + log_q_a[AA[i]] + log_q_d[DD[i]] + log_q_l[LL[i]] + beta * log_popDensity[i];
}

}

model {
  
  target += neg_binomial_2_log_lpmf(lmbCatch | logCatchHat, phi);
  

  target += std_normal_lpdf(q_a_raw);
  target += std_normal_lpdf(q_d_raw);
  target += std_normal_lpdf(q_l_raw);
  
  target += normal_lpdf(log_q_mu | 0,11);


  target += normal_lpdf(mu_q_a | 0,1);
  target += normal_lpdf(mu_q_d | 0,1);
  target += normal_lpdf(mu_q_l | 0,1);
  

  target += exponential_lpdf(sigma_q_a | 5);
  target += exponential_lpdf(sigma_q_d | 5);
  target += exponential_lpdf(sigma_q_l | 5);
  

  // started out with gamma distribution on phi, tried switching to lognormal to better reflect simulated error
  // resulted in more divergences for some reason? inv_gamma was recommended for log param NB, so landed on that

  
  target += inv_gamma_lpdf(phi | 0.4, 0.3);

  target += lognormal_lpdf(beta | -1,0.5);

}

You need to declare these as parameters.

  array[A] real q_a_raw;
  array[D] real q_d_raw;
  array[L] real q_l_raw;

Since the previous version of the code, these somehow migrated from parameters to transformed parameters, where they never get initialized.

1 Like

That did it; thank you for your patience!

1 Like