Help with Bernoulli sampling in model block

hxlsps1104 · July 4, 2024, 9:42am

Hello,
I’m trying to implement a hierarchical mixture model in Stan, but I’m encountering difficulties with sampling a Bernoulli variable within the model block. Here’s a simplified version of my Stan code:

data {
  int<lower=0> Nsub;  // total sample size
  int<lower=0> Nage;  // number of age groups
  vector[Nsub] Zi;    // log antibody levels
  int<lower=0,upper=Nage-1> agegr[Nsub];  // age group for each subject (0-indexed)
  vector[Nage] age1;  // midpoint of each age group
}

parameters {
  real<lower=0> delta;
  real<lower=0> mu2_y; 
  real<lower=0> tau1_mu;
  real<lower=0> tau2_mu;
  real gamma0;
  real<lower=0> gamma1;
  real<lower=0> tau_gamma0;
  real<lower=0> tau_gamma1;
  vector<lower=0,upper=1>[Nsub] Yi;  // Yi is a parameter
}

transformed parameters {
  real mu1_y = mu2_y + delta;
  vector[Nage] theta;
  vector[Nage] p_i;
  
  for (i in 1:Nage) {
    theta[i] = gamma0 + gamma1 * log(age1[i]);
    p_i[i] = 1 - exp(theta[i]) / (1 + exp(theta[i]));
  }
}

model {
  // Priors
  mu2_y ~ uniform(0, mu1_y);
  delta ~ uniform(0, 7);
  tau1_mu ~ gamma(0.01, 0.01);
  tau2_mu ~ gamma(0.01, 0.01);
  gamma0 ~ normal(0, 1/sqrt(tau_gamma0));
  gamma1 ~ normal(0, 1/sqrt(tau_gamma1)) T[0,];
  tau_gamma0 ~ gamma(0.01, 0.01);
  tau_gamma1 ~ gamma(0.01, 0.01);

  // Likelihood
  for (i in 1:Nsub) {
    Yi[i] ~ bernoulli(p_i[agegr[i] + 1]);  // Yi is drawn from Bernoulli distribution
    real mu = mu1_y * Yi[i] + mu2_y * (1 - Yi[i]);  // Use Yi[i] instead of p_i
    real tau = tau1_mu * Yi[i] + tau2_mu * (1 - Yi[i]);  // Use Yi[i] here as well
    Zi[i] ~ normal(mu, 1/sqrt(tau));
  }
}

Additionally, this is the model I’m working on:
Z_i \sim N(Y_i\mu_1 + (1-Y_i)\mu_2, Y_i\sigma_1^2 + (1-Y_i)\sigma_2^2) ,
where
Y_i = \begin{cases} 1 & \text{with probability } \pi(a) \\ 0 & \text{with probability } 1-\pi(a) \end{cases}
(So each sub group will have their own probabability for the Bernoulli variable).
The priors are:
\begin{align*} \mu_1 &\sim U(0,\mu_2) \\ \mu_2 &= \mu_1 + \delta \\ \delta &\sim U(0,C) \\ \sigma_j^{-2} &\sim \text{gamma}(0.0001, 0.0001), \quad j = 1, 2 \\ \gamma_0 &\sim N(0,\tau_0^2) \\ \gamma_1 &\sim N(0,\tau_1^2)T(0,\infty) \\ \tau_0^{-2}, \tau_1^{-2} &\sim \text{gamma}(0.01, 0.01) \end{align*}.
And this is the error:

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0
Semantic error in 'string', line 44, column 4 to column 40:
   -------------------------------------------------
    42:    // Likelihood
    43:    for (i in 1:Nsub) {
    44:      Y[i] ~ bernoulli(p_i[agegr[i] + 1]);
             ^
    45:      real mu = mu1_y * Y[i] + mu2_y * (1 - Y[i]);
    46:      real tau = tau1_mu * Y[i] + tau2_mu * (1 - Y[i]);
   -------------------------------------------------

Ill-typed arguments supplied to function 'bernoulli':
(real, real)
Available signatures:
(int, row_vector) => real
  The first argument must be int but got real
(int, vector) => real
  The first argument must be int but got real
(int, array[] real) => real
  The first argument must be int but got real
(int, real) => real
  The first argument must be int but got real
(array[] int, row_vector) => real
  The first argument must be array[] int but got real
(Additional signatures omitted)

Does anyone know how to solve this problem? Any idea is appreciated. Thank you!

Garren_Hermanus · July 4, 2024, 2:26pm

The bernoulli distribution expects the random variable to be an integer. In your case

Is defined as a vector (and hence real datatypes). Stan also does not know how to deal with integer datatype for parameters, hence defining array[N_sub] int<lower=0, upper=1> Yi will also not work.

You might want to consider the marginalization of Y_i. In the case of discrete variables

P(Z_i) = \sum_{Y_i}{P(Z_i|Y_i)P(Y_i)}

Which in this case would result in a mixture model

P(Z_i) = \pi(\alpha) \mathcal{N}(Z_i|\mu_1,\sigma_1^2) + (1-\pi(\alpha)) \mathcal{N}(Z_i|\mu_2,\sigma_2^2)

hxlsps1104 · July 5, 2024, 5:00am

Hi, thank you for your answer. I just want to clarify that you suggest that I should change my model to be as followed?

Zi[i] ~ p_i[i] * normal(mu1_y, 1/sqrt(tau1_mu)) + (1-p_i[i])*normal(mu2_y, 1/sqrt(tau2_mu)) ;

hxlsps1104 · July 5, 2024, 6:45am

Hi,
Again, thank you for your help. However, our model really needs to include the final \mu and final \tau, which is the following code:

    real mu = mu1_y * Yi[i] + mu2_y * (1 - Yi[i]);  // Use Yi[i] instead of p_i
    real tau = tau1_mu * Yi[i] + tau2_mu * (1 - Yi[i]);  // Use Yi[i] here as well

Do you have any idea on this? Thank you so much!

Garren_Hermanus · July 5, 2024, 8:16am

Essentially yes, but this is not valid stan syntax. You would have to use the log_mix function see Finite Mixtures.

Unfortunately, I do not know how to deal with this in stan. If you need these values for posterior predictions, you can use the mixture model to achieve this.

Topic		Replies	Views
Stochastic Block Model Modeling fitting-issues , specification	1	989	March 24, 2020
Sampling a discrete parameter from a bernoulli prior Modeling	1	643	March 14, 2020
Stochastic Blockmodel by Stan Modeling	1	496	May 13, 2020
How to build a simple hierarchical model with different samples in Stan/rstan? Modeling rstan , hierarchical-model	1	540	November 23, 2022
Help understanding code and generating estimates General	9	948	April 25, 2020

Help with Bernoulli sampling in model block

Related topics