Trouble with initializing multistate model

Problem initializing multistate model.


Dear Stan community,

I am trying to run a multistate model estimating vital rates. The dataset includes recapture and recovery data and we’re estimating survival, movement, recovery, and detection rates. We’re also including age, location of first capture (= cluster), and year as random/fixed covariates. This model works well in JAGS, but I run into an error that I cannot seem to fix while translating it to Stan.

This is the full model:

data {
  int<lower=2> n_occ;
  int<lower=1> n_ind;
  int<lower=1> CH[n_ind,n_occ];
  int<lower=1> AM[n_ind,n_occ];
  int<lower=1> n_age;
  int<lower=1> n_year;
  int<lower=1> n_cluster;
  int<lower=1> year[n_occ];
  int<lower=1> cluster[n_ind];
  int<lower=1> first[n_ind];
  int<lower=1> FR[n_ind];
  int<lower=1> ones[n_ind];
}
parameters {
  real<lower=0,upper=1> s_age[n_age];
  real<lower=0,upper=1> me_age[n_age];
  real<lower=0,upper=1> mi_age[n_age];
  real<lower=0,upper=1> p_age[n_age];
  real<lower=0,upper=1> r_age[n_age];
}
transformed parameters {
  // logit transformation
  real l_s_age[n_age];
  real l_p_age[n_age];
  real l_r_age[n_age];
  l_s_age <- logit(s_age);
  l_p_age <- logit(p_age);
  l_r_age <- logit(r_age);
  // derived parameters
  real<lower=0,upper=1> s_age_year[n_age,n_year];
  real<lower=0,upper=1> p_age_year[n_age,n_year];
  real<lower=0,upper=1> r_age_year[n_age,n_year];
  real<lower=0,upper=1> s_age_year_cluster[n_age,n_year,n_cluster];
  real<lower=0,upper=1> p_age_year_cluster[n_age,n_year,n_cluster];
  real<lower=0,upper=1> s_age_cluster[n_age,n_cluster];
  real<lower=0,upper=1> p_age_cluster[n_age,n_cluster];
  real l_s_age_year[n_age,n_year];
  real l_p_age_year[n_age,n_year];
  real l_r_age_year[n_age,n_year];
  real cluster_s[n_cluster];
  real cluster_p[n_cluster];
  for(a in 1:n_age) {
    for(y in 1:n_year) {
      s_age_year[a,y] <- inv_logit(l_s_age_year[a,y]);
      p_age_year[a,y] <- inv_logit(l_p_age_year[a,y]);
      r_age_year[a,y] <- inv_logit(l_r_age_year[a,y]);
      for(c in 1:n_cluster) {
        s_age_year_cluster[a,y,c] <- inv_logit(l_s_age_year[a,y] + cluster_s[c]);
        p_age_year_cluster[a,y,c] <- inv_logit(l_p_age_year[a,y] + cluster_s[c]);
      }
    }
    for(c in 1:n_cluster) {
      s_age_cluster[a,c] <- inv_logit(sum(l_s_age_year[a,1:n_year])/n_year + cluster_s[c]);
      p_age_cluster[a,c] <- inv_logit(sum(l_p_age_year[a,1:n_year])/n_year + cluster_p[c]);
    }
  }
  // linear predictors
  real<lower=0,upper=1> s[n_ind,n_occ-1];
  real<lower=0,upper=1> me[n_ind,n_occ-1];
  real<lower=0,upper=1> mi[n_ind,n_occ-1];
  real<lower=0,upper=1> p[n_ind,n_occ];
  real<lower=0,upper=1> r[n_ind,n_occ];
  for(i in 1:n_ind) {
    for(t in 1:(first[i]-1)) {
      s[i,t] <- 1;
      me[i,t] <- 1;
      mi[i,t] <- 1;
    }
    for(t in 1:first[i]) {
      p[i,t] <- 1;
      r[i,t] <- 1;
    }
    for(t in first[i]:(n_occ-1)) {
      s[i,t] <- inv_logit(l_s_age_year[AM[i,t],year[t]] + cluster_s[cluster[i]]);
      me[i,t] <- me_age[AM[i,t]];
      mi[i,t] <- mi_age[AM[i,t]];
    }
    for(t in (first[i]+1):n_occ) {
      p[i,t] <- inv_logit(l_p_age_year[AM[i,t],year[t]] + cluster_p[cluster[i]]);
      r[i,t] <- inv_logit(l_r_age_year[AM[i,t],year[t]]);
    }
  }
  // Define state-transition and observation matrices
  real<lower=0,upper=1> ps[5,n_ind,n_occ-1,5];
  real<lower=0,upper=1> po[5,n_ind,n_occ,5];
  for(i in 1:n_ind) {
    for(t in 1:(first[i]-1)) {
      for(ki in 1:5) {
        for(kj in 1:5) {
          ps[ki,i,t,kj] <- 1;
          po[ki,i,t,kj] <- 1;
        }
      }
    }
    for(t in first[i]:(n_occ-1)) {
      // Define probabilities of state S(t+1) given S(t)
      ps[1, i, t, 1] = s[i,t] * (1-me[i,t]);
      ps[1, i, t, 2] = s[i,t] * me[i,t];
      ps[1, i, t, 3] = (1-s[i,t]) * (1-me[i,t]) * r[i,t];
      ps[1, i, t, 4] = (1-s[i,t]) * me[i,t] * r[i,t];
      ps[1, i, t, 5] = (1-s[i,t]) * (1-r[i,t]);
      ps[2, i, t, 1] = s[i,t] * mi[i,t];
      ps[2, i, t, 2] = s[i,t] * (1-mi[i,t]);
      ps[2, i, t, 3] = (1-s[i,t]) * mi[i,t] * r[i,t];
      ps[2, i, t, 4] = (1-s[i,t]) * (1-mi[i,t]) * r[i,t];
      ps[2, i, t, 5] = (1-s[i,t]) * (1-r[i,t]);
      ps[3, i, t, 1] = 0;
      ps[3, i, t, 2] = 0;
      ps[3, i, t, 3] = 0;
      ps[3, i, t, 4] = 0;
      ps[3, i, t, 5] = 1;
      ps[4, i, t, 1] = 0;
      ps[4, i, t, 2] = 0;
      ps[4, i, t, 3] = 0;
      ps[4, i, t, 4] = 0;
      ps[4, i, t, 5] = 1;
      ps[5, i, t, 1] = 0;
      ps[5, i, t, 2] = 0;
      ps[5, i, t, 3] = 0;
      ps[5, i, t, 4] = 0;
      ps[5, i, t, 5] = 1;
      // Define probabilities of O(t) given S(t)
      po[1, i, t, 1] = p[i,t];
      po[1, i, t, 2] = 0;
      po[1, i, t, 3] = 0;
      po[1, i, t, 4] = 0;
      po[1, i, t, 5] = 1 - p[i,t];
      po[2, i, t, 1] = 0;
      po[2, i, t, 2] = p[i,t];
      po[2, i, t, 3] = 0;
      po[2, i, t, 4] = 0;
      po[2, i, t, 5] = 1 - p[i,t];
      po[3, i, t, 1] = 0;
      po[3, i, t, 2] = 0;
      po[3, i, t, 3] = 1;
      po[3, i, t, 4] = 0;
      po[3, i, t, 5] = 0;
      po[4, i, t, 1] = 0;
      po[4, i, t, 2] = 0;
      po[4, i, t, 3] = 0;
      po[4, i, t, 4] = 1;
      po[4, i, t, 5] = 0;
      po[5, i, t, 1] = 0;
      po[5, i, t, 2] = 0;
      po[5, i, t, 3] = 0;
      po[5, i, t, 4] = 0;
      po[5, i, t, 5] = 1;
    }
  }
  // Marginalisation
  real<lower=0,upper=1> zeta[n_ind,n_occ,5];
  for(i in 1:n_ind){
    for(t in 1:(first[i]-1)) {
      for(k in 1:5) {
        zeta[i,t,k] <- 0;
      }
    }
    zeta[i,first[i],1] <- 1;
    zeta[i,first[i],2] <- 0;
    zeta[i,first[i],3] <- 0;
    zeta[i,first[i],4] <- 0;
    zeta[i,first[i],5] <- 0;
    for(t in first[i]:(n_occ-1)) { 
      for(k in 1:5) {  
        zeta[i,(t+1),k] <- dot_product(zeta[i,t,], ps[,i,t,k]) * po[k,i,t,CH[i,(t+1)]];
      }
    }
  }
}
model {
  // Priors
  /// Year random effect
  real sigma_year_s[n_age];
  real sigma_year_p[n_age];
  real sigma_year_r[n_age];
  for(a in 1:n_age) {
    for(y in 1:n_year) {
      l_s_age_year[a,y] ~ normal(l_s_age[a], sigma_year_s[a]);
      l_p_age_year[a,y] ~ normal(l_p_age[a], sigma_year_p[a]);
      l_r_age_year[a,y] ~ normal(l_r_age[a], sigma_year_r[a]);
    }
    sigma_year_s[a] ~ uniform(0.01, 10);
    sigma_year_p[a] ~ uniform(0.01, 10);
    sigma_year_r[a] ~ uniform(0.01, 10);
  }
  /// Cluster random effect
  real sigma_cluster_s;
  real sigma_cluster_p;
  for(c in 1:n_cluster) {
    cluster_s[c] ~ normal(0, sigma_cluster_s);
    cluster_p[c] ~ normal(0, sigma_cluster_p);
  }
  sigma_cluster_s ~ uniform(0.01, 10);
  sigma_cluster_p ~ uniform(0.01, 10);
  // Likelihood 
  vector[n_ind] lik;
  for(i in 1:n_ind) {
    lik[i] <- sum(zeta[i,n_occ,]);
    ones[i] ~ binomial(FR[i], lik[i]);
  }
}

This is the rstan bit of code I use:

parameters.to.keep <- c("s_age", "s_age_year", "s_age_cluster", "s_age_year_cluster", 
                        "me_age", 
                        "mi_age", 
                        "p_age", "p_age_year", "p_age_cluster", "p_age_year_cluster",
                        "r_age", "r_age_year")
ni <- 100
nt <- 2
nb <- ni*0.5
nc <- 1
m <- stan(data = data_c, pars = parameters.to.keep, file = "model.stan",
          chains = nc, warmup = nb, iter = ni, thin = nt, cores = nc)

This is the error I receive:

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model1d282102245__namespace::log_prob: s_age_year[1][1] is nan, but must be greater than or equal to 0.000000 (in 'string', line 66, column 2 to column 49)
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

The parameter ‘s_age_year’ is the ‘survival per age per year’ (transformed parameters {}) and is simply the inverse logit of ‘l_s_age_year’ which is has its priors set in the ‘model {}’ block. I was thinking that I might be indexing some parameter wrong, or refer to the wrong occasion for some parameters, but I can’t find the mistake.

Any help is greatly appreciated!

Cheers,
Louis



I attached the dataset and stan file used for this model.
data_c.RData (10.0 KB)
model.stan (7.4 KB)

Versions:
rstan 2.32.6
R 4.3.3
Run on LENOVO ThinkPad, Windows 10 Pro

The program looks complicated - you may try to use print to check which variables were not properly initialized.

I would also not recommend to use artificial constraints for sigma such as uniform(0.01,10). You could log-transform the sigma, and put normal prior on log_sigma.

Dear @aakhmetz,

Thanks for your reply. since the model does not yet contains samples (as it failed to initialize) I cannot use print(m) to look at the output. Or is this not what you meant?
Also, as sigma is the standard deviation of the normal distribution and therefore cannot be < 0, I think I cannot use a log-transformws normal distribution as a prior. I can only use distributions that are constrained to > 0, such as uniform(0,1).

Cheers,
Louis