Hierarchical Binomial Model cannot initialise some chains when N_SUBJ > x

Hi there,
I have been trying to fit a hierarchical binomial model on the lottery/surebet choices by rats given the lottery offer. I started with a simple non-hierachical version and synthetic data, it works just fine. I then progressed to a species-level model. Interestingly, this model works well when N_SUBJ < 8, meaning all the 4 chains with default parameters run and give no divergences. However, when I set N_SUBJ >= 8 in the synthetic dataset, only 1 out of 4 chains would run and the others gave this typical error:

Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 4:   Stan can't start sampling from this initial value.
Chain 4: Initialization between (-2, 2) failed after 100 attempts.

And the remaining one chain that did initialise would often end with 100% divergences.
When I ramp up N_SUBJ to be a large value, say 20, then none of the chains would initialise…
Note that previously I have completed a bernoulli_logit version of the same model and it works perfectly fine on all levels, but I want to speed up computation by using binoimal if possible due to large data size.
Here’s the model, any help is appreciated!

// hierarchical model fitting one species as beta-rho risky agents with binomial
functions {
  real choiceprob(real rho, real beta, real lott_value, real lott_prob, real sb_value, real rew_multi) {
    real y;
    real u1;	// Lottery utility
    real u2;	// Surebet utility
    
    u1 = lott_prob * (rew_multi * lott_value) ^ rho;
    u2 = (rew_multi * sb_value) ^ rho;
    y = 1 / (1 + exp(-beta * (u1 - u2))); 
    return y;
  }
}
data {
  int<lower=0> T; // Number of trials types 
  int<lower=0> K; // number of subjects in each species
  int individual[T]; // vector of subjid indexes
  vector[T] lott_value; // Lottery value for that choice
  vector[T] lott_prob; // Lottery probabilities for that choice
  vector[T] sb_value; // Surebet values for that choice
  vector[T] total_rew_multi; // total reward multiplier = base_reward * rew_multi
  int n_chose_lott[T]; // number chose lottery for this trial type
  int n_trials[T]; // total number of trials for this trial type
}
parameters {
  real log_rho_s; // species-level rho
  real log_beta_s; // species-level beta
  real<lower=0> log_sigma_rho; // standard deviation for species rho
  real<lower=0> log_sigma_beta; // standard deviation for species beta
  vector[K] log_rho_i; // individual-level rho
  vector[K] log_beta_i; // individual-level beta
}
transformed parameters{
  real rho_s;
  real beta_s;
  vector[K] rho_i;
  vector[K] beta_i;
  rho_s = exp(log_rho_s);
  beta_s = exp(log_beta_s);
  rho_i = exp(log_rho_i);
  beta_i = exp(log_beta_i);
}
model {
  // set weak priors 
  log_rho_s ~ std_normal(); 
  log_beta_s ~ std_normal(); 
  
  // draw individual parameters from the species distribution
  log_rho_i ~ normal(log_rho_s, log_sigma_rho);
  log_beta_i ~ normal(log_beta_s, log_sigma_beta);
  
  // fit the actual model to each trial
  for(t in 1:T){
    n_chose_lott[t] ~ binomial(n_trials[t], choiceprob(rho_i[individual[t]], beta_i[individual[t]], lott_value[t], lott_prob[t], sb_value[t], total_rew_multi[t]));
  }
}

And some example rows of synthetic data file, note that each individual / subject only has 6 rows of data here (6 different lottery offers).

        individual   lott_value lott_prob sb_value total_rew_multi n_chose_lott n_trials
1               1          0       0.5        3               1           34      327
2               1          2       0.5        3               1           65      327
3               1          4       0.5        3               1          135      332
4               1          8       0.5        3               1          192      302
5               1         16       0.5        3               1          344      352
6               1         32       0.5        3               1          359      359
7               2          0       0.5        3               1           70      342
8               2          2       0.5        3               1           83      307
9               2          4       0.5        3               1          129      345
10              2          8       0.5        3               1          192      360
11              2         16       0.5        3               1          262      336
12              2         32       0.5        3               1          293      309
13              3          0       0.5        3               1           83      341

Some things I have tried but did not work:

  1. By using the shinystan interface I found out that when the chains have high divergences, the stepsize is often a ridiculously small number compared to what stepsize is when the model runs fine (~0.35). So I added stepsize = 0.35 in setting, did not help.
  2. Increasing adapt_delta even to 0.99 did not help at all.
  3. Increasing both adapt_delta and set stepsize did not help.

Something even more interesting:

  1. N_SUBJ = 8 :only 1 chain runs
  2. N_SUBJ = 9: 2 chains run
  3. N_SUBJ = 10: 0 chains run
  4. N_SUBJ = 11: 1 chain run
  5. N_SUBJ > 11: 0 chains run

When you have initialization problems the first thing to try is to set init="0".

Looks like lott_value can be high and it could be squared (or more) when calculating the choice probability. If beta is not very small the probability can easily overflow and round to 1.
The calculation is more stable if you use binomial_logit instead of binomial.

real choiceprob_logit(real rho, real beta, real lott_value, real lott_prob, real sb_value, real rew_multi) {
  real y;
  real u1;	// Lottery utility
  real u2;	// Surebet utility
  
  u1 = lott_prob * (rew_multi * lott_value) ^ rho;
  u2 = (rew_multi * sb_value) ^ rho;
 return beta * (u1 - u2))); 
}
...
for(t in 1:T){
  n_chose_lott[t] ~ binomial_logit(n_trials[t],
                                   choiceprob_logit(rho_i[individual[t]], beta_i[individual[t]],
                                                    lott_value[t], lott_prob[t], sb_value[t],
                                                    total_rew_multi[t]));
}