Binomial_lpmf: Probability parameter[1] is 1, but must be in the interval [0, 1]

Hi Stan community. I’m new to the programme and I’m trying to fit a simple one parameter ODE model to some data. My likelihood function runs into a problem when the model is run in R, as follows:

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: binomial_lpmf: Probability parameter[1] is 1, but must be in the interval [0, 1] (in ‘modelda5f5ce0d601_stan_mod’ at line 126)

Below is my stan model code.

functions {
  real[] si(real t, real[] y, real[] theta, real[] x_r, int[] x_i) {
    //parameters to fit
    real lambda0 = theta[1];
    int agrps = x_i[1];  //no. age groups
    int K = x_i[2];  //no. state variables
    real dydt[(agrps*K)];  //define derivative length
    
    for(i in 1:agrps){
      //S
      dydt[i] = - lambda0 * y[i];
      //I
      dydt[agrps+i] = lambda0 * y[i];
      //Im
      dydt[2*agrps+i] = lambda0 * y[i];
    }
    return(dydt);
  }
}

data {
  //for model calculations
  int K; //no. state variables
  int agrps;  //no. age groups
  vector[agrps] age_prop;  //proportion of population in each age group
  int tot_pop;  //total population size
  
  //data to fit
  int<lower=0>cases[agrps];  //no. positive cases
  int<lower=0>n[agrps];  //denominator for each age group
  vector<lower=0>[2] p_lambda0;
  
  //simulation
  real t0;  //starting time
  int t;  //no. years to run model
  real ts[t];  //time bins
  int inference;  //simulate w/o data (inference==0) or with data (inference==1)
  int doprint;
}

transformed data {
  real x_r[0];   // da, r, propfert, d...
  int x_i[2] = {agrps, K};   //no. age groups, no. state variables
}

//parameters accepted by the model [that we want values for]
parameters {
  real<lower=0>lambda0;
}

transformed parameters {
  real theta[1];
  // change of format for integrate_ode_rk45
  real init[agrps*K];  // initial values
  real y[t,agrps*K];   // raw ODE outputs
  vector[agrps] comp_S[t];
  vector[agrps] comp_I[t];
  vector[agrps] comp_Im[t];
  vector[agrps] comp_pI[t];
  
  theta[1] = lambda0;
  for(i in 1:agrps){
    init[i] = age_prop[i];  //proportion in S0
    init[agrps+i] = 0;      //proportion in I0
    init[2*agrps+i] = 0;    //proportion in Im0
  }
  
  //run solver
  y = integrate_ode_rk45(
    si,      //model fucntion
    init,    //initial values
    t0,      //initial time
    ts,      //evaluation dates
    theta,   //parameters
    x_r,     //real data
    x_i,     //integer data
    1.0E-10, //tolerances 
    1.0E-10, //tolerances 
    1.0E3);  //maximum steps
    
    //extract and format ODE results 
    for(i in 1:t) {
      comp_S[i] = (to_vector(y[i,1:agrps])) * tot_pop;  //total no. (age-stratified) in S
      comp_I[i] = (to_vector(y[i,(agrps+1):(2*agrps)])) * tot_pop;  //total no. (age-stratified) in I
      comp_Im[i] = (to_vector(y[i,(2*agrps+1):(3*agrps)])) * tot_pop;   //total no. (age-stratified) in Im
    }
    
    //compute seroprevalence
    for(i in 1:t) {
      for(j in 1:agrps){
        comp_pI[i,j] = (comp_I[i,j] + comp_Im[i,j]) / (comp_I[i,j] + comp_Im[i,j] + comp_S[i,j]);
      }
    }
}

model {
  lambda0 ~ lognormal(0, 2);
  
  //debug
  if(doprint==1) {
    print("lambda0: ", lambda0);
    print("comp_S: ", comp_S);
    print("comp_I: ", comp_I);
    print("comp_Im: ", comp_Im);
    print("comp_pI: ", comp_pI);
  }
  
  // likelihood 
  if (inference==1) {
    for(i in 1:agrps) { 
    target += binomial_lpmf(cases[i] | n[i], comp_pI[i]);
    }
  }
}

If anyone has advice on how I can avoid getting this error, that’d be really appreciated.

Best wishes
Greg

Yeah, these edge cases can be annoying. We fixed something like this earlier in the summer: https://github.com/stan-dev/math/issues/1986

That fix is available in cmdstanr: https://mc-stan.org/cmdstanr/reference/index.html , though I don’t know if that is what is happening here.

It sounds like for some i, an element of comp_pI[i] is 1. The first question to dig into is why that is. It might be correct, or it might be numeric overflow (should actually be 0.999999999), etc.

target += binomial_lpmf(cases[i] | n[i], comp_pI[i])

cases[i] is a single number, but comp_pI[i] is a row vector. I don’t know the model, but this sort of thing is often a bug. The effect here is that this is a vectorized likelihood term and so will increment target nrow(comp_pI[i]) times.

1 Like

Thanks for your response, Ben. I’ve since discovered a reason for why comp_pI is calculated to be ~1.

Printing the model output shows that the first line of the for loop that calculates comp_S isn’t working as it should. Each of the comp_S, comp_I and comp_Im, vectors are multiplied by a large number (‘tot_pop’) to give total no. of simulated individuals in each state. While the model print-out shows the for loop is working as it should for comp_I and comp_I, it returns very small values for comp_S.

Hence in calculating comp_pI, the answer is very close to 1. Any ideas as to why this for loop isn’t correctly calculating comp_S? I can see no problems with the code here.

If the code looks right and the other things are working, I’d worry about the inputs (which are the output of the ODE solve). If you plot the outputs of the ODE solve do they look like what you expect?

That’s very helpful, thank you Ben. From that I’ve discovered that the model is probably working as expected. The for loop correctly calculates comp_S, but at the max time of the model, comp_S is small because all have moved to next 2 compartments.

A case of simplifying a model to get it working in a new programme actually hindering the development process.

I have a quick question about my likelihood function. I currently have it set up so that the length of my data vector matches the model output, which means that many elements of the data vector are 0. Will Stan accept vectors containing many zeros? I thought it would considering it wouldn’t contribute to the log likelihood function. Wondering if this would be a problem down the line.

Double check this. By my reading it is:

target += binomial_lpmf(cases[i] | n[i], comp_pI[i]);

cases[i] is a single number, n[i] is a single number, and comp_pI[i] is an agrps number. In the comp_pI case you’ll get out of bounds indexes if t < agrps. That really looks like a bug.

Unless I am misunderstanding how Stan works, I was under the impression that cases[i] is actually a series of integers (length agrps) , as is n[i]? This is how it is defined in R. It is also defined in Stan as:

int<lower=0>cases[agrps];
int<lower=0>n[agrps];

I have now changed the likelihood function to sample estimates from comp_pI at the max value of time (t)

target += binomial_lpmf(cases[i] | n[i], comp_pI[t,i]);

This runs and gives reasonably model output when printed but for fitting throws the error:

There were 5 divergent transitions after warmup

int<lower=0>cases[agrps];
int<lower=0>n[agrps];

cases and n are both length agrps arrays of integers. cases[i] and n[i] are the ith entry in each array (so they’re both integers)

target += binomial_lpmf(cases[i] | n[i], comp_pI[t,i]);

Yeah that looks reasonable.

With a few divergences, try bumping up adapt delta to like 0.95 and see if they go away. It means the sampler will use a smaller timestep in the internal integrator which will be slower but can make these divergences disappear.

Divergences mean there’s something a bit numerically difficult about the posterior. If they go away with a little adapt delta it’s not worth worrying about, but if they don’t disappear then there’s something going on that needs investigated.

cases and n are both length agrps arrays of integers. cases[i] and n[i] are the ith entry in each array (so they’re both integers)

OK thanks, as I thought. The likelihood function samples comp_pI at corresponding indexes too. So that looks all fine?

With a few divergences, try bumping up adapt delta to like 0.95 and see if they go away. It means the sampler will use a smaller timestep in the internal integrator which will be slower but can make these divergences disappear.

Yeah so I have already increased adapt delta but still getting divergences errors. Could this just be due to the fact that the model function & parameters don’t do a good job of estimating the data? This an over-simplified version of the full R model that I’m gradually translating into Stan.

Thanks again, Ben.

Yeah sometimes it can be that a more complicated model samples better than a simpler one. It would be a fair thing to try the more complicated model.

You could also try tightening up the prior on lambda0.

Thanks Ben. I went for this strategy: I’ve slowly introduced more complexity into the model and it’s working well, with an exception.

My x_r array needs to include real numbers of different lengths (some 1, others 400). I’ve had little success with just listing them as an array in the transformed data block, like x_r[n] = {variable1, variable2, variable3}. Is there a way Stan can accept an x_r array containing variables of different lengths?

There is the option of creating a different variable for each of the 400 elements of the array, e.g.:

Instead of real x[400] I could define real x1, real x2, real x3...real x400. But ideally wouldn’t want to do this. Is there a more elegant solution?

Cheers.

Good to hear!

Not yet. It’s something we want, but it’s not in the language yet.

The way to do this is if you have arrays of different length, concatenate them all into one big array, and then have another array of start indices. For instance:

real big_array[N]; // N total values
int starts[S + 1]; // Start index of S arrays (and the last index is N + 1)

big_array[starts[1]:(starts[2] - 1)] // first array
big_array[starts[s]:(starts[s + 1] - 1)] // sth array

More doc is here: https://mc-stan.org/docs/2_24/stan-users-guide/ragged-data-structs-section.html

Ah amazing, that’s just the trick. Just to clarify, is this all done within the data block?

And would this all be defined within the same block, e.g. within a for loop, and then big_array could be saved to x_r? Apologies if daft question.

Yeah usually you’d build the big_array and starts array outside in something like R or Python (where you are allowed to have ragged arrays – which is the name for arrays of arrays with different lengths) and pass it in through the data block.

Ooo, you’re using this in the ODE solver. We recently changed the interface on the ODE solver stuff to make it easier to pass things in (so you don’t have to pack everything into x_r and x_i any more).

There’s some info on that here: https://mc-stan.org/users/documentation/case-studies/convert_odes.html

That isn’t available in rstan yet, but it is available in cmdstanr: https://mc-stan.org/cmdstanr/index.html

That’s very useful to know, thank you. Unfortunately, in translating my model to the new ODE interface, I have encountered an issue:

The new ode_rk45_tol function isn’t recognising my ODE function, which is now defined as

functions{
  vector mod_si(
    real t, vector y, ...){

and called as

  y = ode_rk45_tol(
    mod_si,      //model fucntion
    init,    //vector initial values
    t0,      //real initial time
    ts,      //real[] times
    1.0E-10, //real rel_tol 
    1.0E-10, //real abs_tol 
    1.0E3, //int max_num_steps
    lambda0,  //pass parameters directly into model
    agrps, K, //pass int data directly into model
    r, da, mctr, d, propfert //pass real data directly into model
    );  

When I compile the model I get the following error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable “mod_si” does not exist.
error in ‘model6bc72ba6fdfb_stan_mod_simple’ at line 180, column 10

Huh strange. Can you post the model and I’ll try to get it to compile?

Thanks Ben. The error message is only thrown when I click ‘Check’ in R studio in the stan file. When I compile (using cmdstan_model function) & fit the model in my R script, I get reasonable looking output except some nan values. Specifically, while I should be getting vectors of length 400 returned and filled with values, I get 181 elements of nan, followed by 219 elements containing (reasonable)numbers. I’m unsure why this is happening. The full Stan model code is below. Thanks again!

functions{
  vector mod_si(
    real t, vector y,  //y is a vector in new ODE interface 
    real lambda0,  //pass parameters directly into model
    // real lambda1,
    //real gradient, real shape, 
    int agrps, int K, //pass int data directly into model
    // vector age, 
    real r, real da, real[] mctr, real[] d, real[] propfert //pass real data directly into model
    ) {
      
      //define variables to calculate within model 
      real deaths[agrps];
      vector[agrps] Na;
      real births_age[agrps];
      real births;
      real dprev[agrps];
      real seroconv1[agrps];
      real seroconv2[agrps];
      real seroconv3[agrps];
      real c1[agrps];
      real c2[agrps];
      real c3[agrps];
      real ct1[agrps];
      real ct2[agrps];
      real ct3[agrps];
      real matAb1[agrps];
      real matAb2[agrps];
      real matAb3[agrps];
      vector[agrps] pI;
      real matAbt;
      real ctt;
      
      //define derivative length
      vector[(agrps*K)] dydt;
      
      // define states
      vector[agrps] S = y[1:agrps];
      vector[agrps] I = y[(agrps+1):(2*agrps)];
      vector[agrps] Im = y[((2*agrps)+1):(3*agrps)];
      
      //define foi
      real foi[agrps];
      foi = rep_array(lambda0, agrps);
      // for(i in 1:agrps){
        //foi[i] = (lambda0 + lambda1*(age[i]^2) * (age[i] * exp(-gradient*age[i])))*shape;
      //   foi[i] = lambda0 + lambda1*(pow(age[i], 2));
      // }
      
      //total modelled population size
      for(i in 1:agrps){
        Na[i] = S[i] + I[i] + Im[i];
      }
      
      //age-specific total deaths
      for(i in 1:agrps){
        deaths[i] = d[i] * Na[i];
      }
      
      //age-specific total births
      for(i in 1:agrps){
        births_age[i] = deaths[i] * propfert[i];
      }
      
      //total births (all ages)
      births = sum(births_age);
      
      // conception distribution
      //move age back 3, 6 or 9 mo to calculate conception distribution for 3 trimesters
      //e.g. c3 = conceived ~9 months ago (more accurately, 7.5 months ago)
      for(i in 1:(agrps-3)){
        c1[i] = births_age[i+1];
        c2[i] = births_age[i+2];
        c3[i] = births_age[i+3];
      }
      
      //seroprevalence
      for(i in 1:agrps){
        pI[i] = (I[i] + Im[i])/Na[i];
      }
      
      //calculating seroconversions in pregnancy and cases of congenital disease
      for(i in 1:(agrps-3)){
        if(i==1){
          dprev[i] = 0;
          seroconv1[i] = 0;
          seroconv2[i] = 0;
          seroconv3[i] = 0;
          ct1[i] = 0;
          ct2[i] = 0;
          ct3[i] = 0;
          matAb1[i] = 0;
          matAb2[i] = 0;
          matAb3[i] = 0;
          
        } else {
          dprev[i] = pI[i]-pI[i-1];                //change in prevalence (must be positive)
          seroconv1[i] = dprev[i]*c1[i];           //pregnant women seroconverting in trimester 1
          seroconv2[i] = dprev[i]*c2[i];           //pregnant women seroconverting in trimester 2
          seroconv3[i] = dprev[i]*c3[i];           //pregnant women seroconverting in trimester 3
          ct1[i+3] = seroconv1[i]*mctr[1];         //likelihood of transmission trimester 1
          ct2[i+2] = seroconv2[i]*mctr[2];         //likelihood of transmission trimester 2
          ct3[i+1] = seroconv3[i]*mctr[3];         //likelihood of transmission trimester 3
          matAb1[i+3] = seroconv1[i]*(1-mctr[1]);  //maternal Ab trimester 1
          matAb2[i+2] = seroconv2[i]*(1-mctr[2]);  //maternal Ab trimester 2
          matAb3[i+1] = seroconv3[i]*(1-mctr[3]);  //maternal Ab trimester 3
        }
      }
      
      //total number of antibody positive and congenitally diseased births
      matAbt = sum(matAb1) + sum(matAb2) + sum(matAb3);
      ctt = sum(ct1) + sum(ct2) + sum(ct3);
      
      //model ODEs
      for(i in 1:agrps){
        if(i==1){
          //S
          dydt[i] = (births - matAbt - ctt) + r*Im[i] - foi[i]*S[i] - d[i]*S[i] - da*S[i];
          //I
          dydt[agrps+i] = ctt + foi[i]*(Na[i]-I[i]) - d[i]*I[i] - da*I[i];
          //Im
          dydt[2*agrps+i] = matAbt - (foi[i] + r + d[i] + da) * Im[i];
          
        } else if(i>1){
          //S
          dydt[i] = da*S[i-1] + r*Im[i] - foi[i]*S[i] - d[i]*S[i] - da*S[i];
          //I
          dydt[agrps+i] = da*I[i-1] + foi[i]*(Na[i]-I[i]) - d[i]*I[i] - da*I[i];
          //Im
          dydt[2*agrps+i] = da*Im[i-1] - (foi[i] + r + d[i] + da) * Im[i];
        } else{
          //S
          dydt[i] = da*S[i-1] + r*Im[i] - foi[i] * S[i] - d[i]*S[i];
          //I
          dydt[agrps+i] = da*I[i-1] + foi[i]*(Na[i]-I[i]) - d[i]*I[i];
          //Im
          dydt[2*agrps+i] = da*Im[i-1] - (foi[i] + r + d[i]) * Im[i];
        }
      }
      return(dydt);
    }
}

data {
  //for model calculations
  int K; //no. state variables
  int agrps;  //no. age groups
  vector[agrps] age_prop;  //proportion of population in each age group
  int tot_pop;  //total population size
  
  vector[agrps] age;
  real r;
  real da;
  real mctr[3];
  real propfert[agrps];
  real d[agrps];
  
  //data to fit
  int<lower=0>cases[agrps];  //no. positive cases
  int<lower=0>n[agrps];  //denominator for each age group

  //simulation
  real t0;  //starting time
  int t;  //no. years to run model
  real ts[t];  //time bins
  real rel_tol;
  real abs_tol;
  int max_num_steps;
  int inference;  //simulate w/o data (inference==0) or with data (inference==1)
  int doprint;
  
  //formatting ode results
  int data_agrps;
  int data_rows[data_agrps*K];
}

// transformed data {
//   real x_r[2+agrps*2] = {r_array};   //r, da, d, propfert
//   int x_i[2] = {agrps, K};   // agrps, K (no. state variables), S_r (no. real arrays), starts_r
// }

//parameters accepted by the model [that we want values for]
parameters {
  real<lower=0, upper=0.2>lambda0;
  real<lower=0, upper=0.2>lambda1;
  // real<lower=0, upper=0.2>gradient;
  // real<lower=0, upper=0.01>shape;
}

transformed parameters {
  // change of format for integrate_ode_rk45
  vector[agrps*K] init;  //initial values
  
  vector[agrps*K] y[t];   //raw ODE outputs
  vector[agrps] comp_S;
  vector[agrps] comp_I;
  vector[agrps] comp_Im;
  vector[agrps] comp_pI;
  
  for(i in 1:agrps){
    init[i] = age_prop[i];  //proportion in S0
    init[agrps+i] = 0;      //proportion in I0
    init[2*agrps+i] = 0;    //proportion in Im0
  }
  
  //run solver
  y = ode_rk45_tol(
    mod_si,  //model function
    init,    //vector initial values
    t0,      //real initial time
    ts,      //real[] times
    rel_tol,
    abs_tol,
    max_num_steps,
    lambda0,  //pass parameters directly into model
    // lambda1, gradient, shape, 
    // age, 
    agrps, K, //pass int data directly into model
    r, da, mctr, d, propfert //pass real data directly into model
    );  
    
    //extract and format ODE results
    comp_S  = (to_vector(y[t,1:agrps])) * tot_pop;  //total no. (age-stratified) in S
    comp_I  = (to_vector(y[t,(agrps+1):(2*agrps)])) * tot_pop;  //total no. (age-stratified) in I
    comp_Im = (to_vector(y[t,(2*agrps+1):(3*agrps)])) * tot_pop;   //total no. (age-stratified) in Im
    
    //extract particular age groups specified in the data
    // comp_S  = (to_vector(y[t,data_rows[1:16]])) * tot_pop;  //total no. (age-stratified) in S
    // comp_I  = (to_vector(y[t,data_rows[17:32]])) * tot_pop;  //total no. (age-stratified) in I
    // comp_Im = (to_vector(y[t,data_rows[33:48]])) * tot_pop;   //total no. (age-stratified) in Im
    
    //compute seroprevalence
    for(j in 1:agrps){
      comp_pI[j] = (comp_I[j] + comp_Im[j]) / (comp_S[j] + comp_I[j] + comp_Im[j]);
    }
}

model {
  lambda0 ~ lognormal(log(.1), .01);
  // lambda1 ~ lognormal(log(.1), .01);
  // gradient ~ lognormal(log(.1), .01);
  // shape ~ lognormal(log(.1), .01);
  
  //debug
  if(doprint==1) {
    print("lambda0: ", lambda0);
    print("comp_S: ", comp_S);
    print("comp_I: ", comp_I);
    print("comp_Im: ", comp_Im);
    print("comp_pI: ", comp_pI);
  }
  
  // likelihood 
  if (inference==1) {
    for(i in 1:agrps) { 
    target += binomial_lpmf(cases[i] | n[i], comp_pI[i]);
    }
  }
}

This is the rather confusing error message you get in Stan versions that don’t support the new ODE interface. You need to install CmdStanR because the latest RStan is a bit behind.

Which variable?

Stan saves variables in the parameters, transformed parameters, and generated quantities blocks. NaN parameters would be unusual, and you don’t have a generated quantities block, so I’m guessing this is something in transformed parameters.

The lingo with transformed parameters is that by default all the variables there are initialized to NaN, so seeing a bunch of NaNs on the output makes me think that one of the transformed parameters isn’t having anything written to it. But then looking at the code it seems to me like they should all be initialized. But I’m probably overlooking something so what variable is it that has NaNs?

Apologies, I wasn’t too clear in my last message. The variables with some NaNs are comp_S, comp_I, comp_Im and comp_pI, which I print in the transformed parameters block using this code:

  if(doprint==1) {
    print("lambda0: ", lambda0);
    print("comp_S: ", comp_S);
    print("comp_I: ", comp_I);
    print("comp_Im: ", comp_Im);
    print("comp_pI: ", comp_pI);
  }

I used this approach with RStan and didn’t get these NaNs. I would put this code in generated quantities, except that I need comp_pI for my likelihood function.