Help with Bernoulli sampling in model block

Hello,
I’m trying to implement a hierarchical mixture model in Stan, but I’m encountering difficulties with sampling a Bernoulli variable within the model block. Here’s a simplified version of my Stan code:

data {
  int<lower=0> Nsub;  // total sample size
  int<lower=0> Nage;  // number of age groups
  vector[Nsub] Zi;    // log antibody levels
  int<lower=0,upper=Nage-1> agegr[Nsub];  // age group for each subject (0-indexed)
  vector[Nage] age1;  // midpoint of each age group
}

parameters {
  real<lower=0> delta;
  real<lower=0> mu2_y; 
  real<lower=0> tau1_mu;
  real<lower=0> tau2_mu;
  real gamma0;
  real<lower=0> gamma1;
  real<lower=0> tau_gamma0;
  real<lower=0> tau_gamma1;
  vector<lower=0,upper=1>[Nsub] Yi;  // Yi is a parameter
}

transformed parameters {
  real mu1_y = mu2_y + delta;
  vector[Nage] theta;
  vector[Nage] p_i;
  
  for (i in 1:Nage) {
    theta[i] = gamma0 + gamma1 * log(age1[i]);
    p_i[i] = 1 - exp(theta[i]) / (1 + exp(theta[i]));
  }
}

model {
  // Priors
  mu2_y ~ uniform(0, mu1_y);
  delta ~ uniform(0, 7);
  tau1_mu ~ gamma(0.01, 0.01);
  tau2_mu ~ gamma(0.01, 0.01);
  gamma0 ~ normal(0, 1/sqrt(tau_gamma0));
  gamma1 ~ normal(0, 1/sqrt(tau_gamma1)) T[0,];
  tau_gamma0 ~ gamma(0.01, 0.01);
  tau_gamma1 ~ gamma(0.01, 0.01);

  // Likelihood
  for (i in 1:Nsub) {
    Yi[i] ~ bernoulli(p_i[agegr[i] + 1]);  // Yi is drawn from Bernoulli distribution
    real mu = mu1_y * Yi[i] + mu2_y * (1 - Yi[i]);  // Use Yi[i] instead of p_i
    real tau = tau1_mu * Yi[i] + tau2_mu * (1 - Yi[i]);  // Use Yi[i] here as well
    Zi[i] ~ normal(mu, 1/sqrt(tau));
  }
}

Additionally, this is the model I’m working on:
Z_i \sim N(Y_i\mu_1 + (1-Y_i)\mu_2, Y_i\sigma_1^2 + (1-Y_i)\sigma_2^2) ,
where
Y_i = \begin{cases} 1 & \text{with probability } \pi(a) \\ 0 & \text{with probability } 1-\pi(a) \end{cases}
(So each sub group will have their own probabability for the Bernoulli variable).
The priors are:
\begin{align*} \mu_1 &\sim U(0,\mu_2) \\ \mu_2 &= \mu_1 + \delta \\ \delta &\sim U(0,C) \\ \sigma_j^{-2} &\sim \text{gamma}(0.0001, 0.0001), \quad j = 1, 2 \\ \gamma_0 &\sim N(0,\tau_0^2) \\ \gamma_1 &\sim N(0,\tau_1^2)T(0,\infty) \\ \tau_0^{-2}, \tau_1^{-2} &\sim \text{gamma}(0.01, 0.01) \end{align*}.
And this is the error:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0
Semantic error in 'string', line 44, column 4 to column 40:
   -------------------------------------------------
    42:    // Likelihood
    43:    for (i in 1:Nsub) {
    44:      Y[i] ~ bernoulli(p_i[agegr[i] + 1]);
             ^
    45:      real mu = mu1_y * Y[i] + mu2_y * (1 - Y[i]);
    46:      real tau = tau1_mu * Y[i] + tau2_mu * (1 - Y[i]);
   -------------------------------------------------

Ill-typed arguments supplied to function 'bernoulli':
(real, real)
Available signatures:
(int, row_vector) => real
  The first argument must be int but got real
(int, vector) => real
  The first argument must be int but got real
(int, array[] real) => real
  The first argument must be int but got real
(int, real) => real
  The first argument must be int but got real
(array[] int, row_vector) => real
  The first argument must be array[] int but got real
(Additional signatures omitted)

Does anyone know how to solve this problem? Any idea is appreciated. Thank you!

The bernoulli distribution expects the random variable to be an integer. In your case

Is defined as a vector (and hence real datatypes). Stan also does not know how to deal with integer datatype for parameters, hence defining array[N_sub] int<lower=0, upper=1> Yi will also not work.

You might want to consider the marginalization of Y_i. In the case of discrete variables

P(Z_i) = \sum_{Y_i}{P(Z_i|Y_i)P(Y_i)}

Which in this case would result in a mixture model

P(Z_i) = \pi(\alpha) \mathcal{N}(Z_i|\mu_1,\sigma_1^2) + (1-\pi(\alpha)) \mathcal{N}(Z_i|\mu_2,\sigma_2^2)
1 Like

Hi, thank you for your answer. I just want to clarify that you suggest that I should change my model to be as followed?

Zi[i] ~ p_i[i] * normal(mu1_y, 1/sqrt(tau1_mu)) + (1-p_i[i])*normal(mu2_y, 1/sqrt(tau2_mu)) ;

Hi,
Again, thank you for your help. However, our model really needs to include the final \mu and final \tau, which is the following code:

    real mu = mu1_y * Yi[i] + mu2_y * (1 - Yi[i]);  // Use Yi[i] instead of p_i
    real tau = tau1_mu * Yi[i] + tau2_mu * (1 - Yi[i]);  // Use Yi[i] here as well

Do you have any idea on this? Thank you so much!

Essentially yes, but this is not valid stan syntax. You would have to use the log_mix function see Finite Mixtures.

Unfortunately, I do not know how to deal with this in stan. If you need these values for posterior predictions, you can use the mixture model to achieve this.