Divergent transitions in ICAR

I added a ICAR component to logistic regression type model
(https://mc-stan.org/users/documentation/case-studies/icar_stan.html). However, all the transitions are divergent. Do you have any idea how to solve this problem?

  functions{
    real icar_normal_lpdf( vector phi, int N, int[] node1, int[] node2){
      return -0.5 * dot_self(phi[node1] - phi[node2]) + normal_lpdf( sum(phi) | 0, 0.001 * N);
    }
  }
  data{
    int<lower=1> N;
    int SUC[N];
    int TRIAL[N];
    int AGE1[N];
    int AGE2[N];
    int AGE3[N];
    int AGE4[N];
    int AGE5[N];
    int AGE6[N];
    int AGE7[N];
    int AGE8[N];
    int AGE9[N];
    int AGE10[N];
    int AGE11[N];
    int AGE12[N];
    int AGE13[N];
    int AGE14[N];
    int AGE15[N];
    int AGE16[N];
    int AGE17[N];
    int AGE18[N];
    int AGE19[N];
    int AGE20[N];
    int AGE21[N];
    int AGE22[N];
    int AGE23[N];
    int AGE24[N];
    int AGE25[N];
    int AGE26[N];
    int AGE27[N];
    int AGE28[N];
    int AGE29[N];
    int AGE30[N];
    int AGE31[N];
    int AGE32[N];
    int AGE33[N];
    int AGE34[N];
    int MALE[N];
    int COMM_TYPE_F[N];
    int<lower=0> N_age;
    int<lower=0> N_edges;
    int<lower=1, upper=N_age> node1[N_edges];  // node1[i] adjacent to node2[i]
    int<lower=1, upper=N_age> node2[N_edges];  // and node1[i] < node2[i]
  }
  parameters{
    real a;
    real male;
    vector[N_age] phi; // age coefficients
    real fishing;
    real<lower=0> dispersion;
    real sigma;
  }
  transformed parameters{
    vector[N] p_ni;
    for ( i in 1:N ) {
        p_ni[i] = inv_logit(a + male * MALE[i] + sigma * phi[1] * AGE1[i] + sigma * phi[2] * AGE2[i] + sigma * phi[3] * AGE3[i] + sigma * phi[4] * AGE4[i] + 
        sigma * phi[5] * AGE5[i] + sigma * phi[6] * AGE6[i] + sigma * phi[7] * AGE7[i] + sigma * phi[8] * AGE8[i] + sigma * phi[9] * AGE9[i] + sigma * phi[10] * AGE10[i] + 
        sigma * phi[11] * AGE11[i] + sigma * phi[12] * AGE12[i] + sigma * phi[13] * AGE13[i] + sigma * phi[14] * AGE14[i] + sigma * phi[15] * AGE15[i] + sigma * phi[16] * AGE16[i] +
        sigma * phi[17] * AGE17[i] + sigma * phi[18] * AGE18[i] + sigma * phi[19] * AGE19[i] + sigma * phi[20] * AGE20[i] + sigma * phi[21] * AGE21[i] + sigma * phi[22] * AGE22[i] +
        sigma * phi[23] * AGE23[i] + sigma * phi[24] * AGE24[i] + sigma * phi[25] * AGE25[i] + sigma * phi[26] * AGE26[i] + sigma * phi[27] * AGE27[i] + sigma * phi[28] * AGE28[i] + 
        sigma * phi[29] * AGE29[i] + sigma * phi[30] * AGE30[i] + sigma * phi[31] * AGE31[i] + sigma * phi[32] * AGE32[i] + sigma * phi[33] * AGE33[i] + sigma * phi[34] * AGE34[i] + 
        fishing * COMM_TYPE_F[i] );
    }
  }
  model{
    SUC ~ beta_binomial( TRIAL , p_ni*dispersion , (1-p_ni)*dispersion );
    dispersion ~ exponential( 1 );
    fishing ~ normal( 0 , 10 );
    phi ~ icar_normal_lpdf(N_age,node1, node2);
    sigma ~ cauchy(0, 1);
    male ~ normal( 0 , 10 );
    a ~ normal( 0 , 100 );
  }
  generated quantities{
    int SUC_predict[N];
    real log_lik[N];
    for(i in 1:N){
      SUC_predict[i] = beta_binomial_rng(TRIAL[i], p_ni[i] * dispersion,
       (1-p_ni[i])*dispersion );
      log_lik[i] = beta_binomial_lpmf(SUC[i] | TRIAL[i], p_ni[i] * dispersion,
       (1-p_ni[i])*dispersion );
    }
  }

what is the data? how have you constructed the lists node1 and node2? the neighborhood graph needs to be fully connected, i.e., no islands.

FWIW, you should also be able to code up the vectors of counts per age group as a 2-dimensional integer array, e.g.:

int AGE[N_age, N];

what are you encoding in the AGE arrays? is just one AGE[i] == 1 and the rest 0 for all i?
what kind of smoothing do you want?

the prior:

dispersion ~ exponential( 1 );

is very strong - most of the mass is near 0. do you mean dispersion or perhaps inverse?

1 Like

Thank you for your help!

The data we have is the number of sampled people (SUC) and the number of population (TRIAL) in groups stratified by one-year age group, gender and locations. AGE1,…,34 are indicators of people aged 15-16, …, 48-49.

I defined
$N_age
[1] 34

$N_edges
[1] 33

$node1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

$node2
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

So there are no islands.

Thank you! AGE1, …,AGE34 could be written more neatly in AGE[N_age, N].

As I hope to model sampling probability for each group and there might be limited number of sampled people in each group, I want to borrow information from neighbouring age group.

I mean dispersion parameter for exponential( 1 ). What is a better choice for the prior usually?

so you actually just have 2 ordered arrays of length 34 - SUC == count sampled and TRIAL == population - you don’t need these indicators AGE1…AGE34 at all.

and your chance of success given the population is really the sampling rate per age group, no?
to me, this reduces to the possion_log model from the case study

 SUC ~ poisson_log(TRIAL_log + a + male * MALE + fishing * COMM_TYPE_F + phi * sigma);

(you need to take the log of the TRIALS - I think the case study covers the math behind this move)

I have 2 arrays of length 634 actually, as the group is stratified by a lot of characteristics. So I may still need indicators to show the associated age group.

Thank you. I think the Poisson model makes sense. But if we stick to beta-binomial or binomial type model, why the transitions are divergent? Are there any solutions for the divergence?

does the beta-binomial model converge without the ICAR component?

1 Like

you would still only need a single array of indices 1…34 that tells you what age cohort the success/trial pair comes from, no?

Ahhh That’s a problem. The model without the ICAR does not converge. Does it cause by the prior of dispersion parameter? Do you know the ‘default’ choice for the prior of dispersion parameter?

Yes I think so! Thank you for pointing this out.

see the stan wiki page: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

perhaps start with a weakly informative prior -

  • Weakly informative prior, very weak: normal(0, 10);
  • Generic weakly informative prior: normal(0, 1);