Vector vs. Loop, Beta-Binomial

Stan Community,

I wrote this Stan code using loops. Where can I use a vector syntax to avoid some of these loops? Thanks for the instruction. Other style tips are welcomed, too.

For context, I have participants, and those repeat for each geography variable (max_geography is the number of unique geography values).

Participants are measured on a binomial outcome, so I’m interested in the success counts “s” out of “n_count” trails modeled by the probability of success theta.

I modeled a hierarchical beta distribution of participants’ thetas within each geography. I’m interested in the mean and standard deviation of each geography’s beta distribution.

data {
  int <lower = 0> N;
  int <lower = 0> max_geography;
  array[N] int <lower = 0> geo;
  array[N] int <lower = 0> n_count;
  array[N] int <lower = 0> s;
}

parameters {
  array[N] real <lower = 0, upper = 1> theta;
  array[max_geography] real <lower = 0> alpha;
  array[max_geography] real <lower = 0> beta;
}

transformed parameters {
  array[max_geography] real beta_mu;
  array[max_geography] real <lower = 0> beta_sigma;
  
  for(i in 1:max_geography){
    beta_mu[i] = alpha[i]/(alpha[i] + beta[i]);
    beta_sigma[i] = sqrt((alpha[i] * beta[i]) / ((alpha[i] + beta[i]) * (alpha[i] + beta[i]) * (1 + alpha[i] + beta[i])));
  }
}

model {
  for(i in 1:max_geography){
    alpha[i] ~ exponential(0.1);
    beta[i]  ~ exponential(0.1);
  }
  
  for(n in 1:N) {
    theta[n] ~ beta(2, 3);
  }
  
  for(n in 1:N) {
    s[n] ~ binomial(n_count[n], theta[n]);
  }
  
  for(n in 1:N){
    theta[n] ~ beta(alpha[geo[n]], beta[geo[n]]);
  }
}

generated quantities {
  array[max_geography] real theta_rep;
  
  for(i in 1:max_geography) {
    theta_rep[i] = beta_rng(alpha[i], beta[i]);
  }
}
data {
 int <lower = 0> N;
 int <lower = 0> max_geography;
 array[N] int <lower = 0> geo;
 array[N] int <lower = 0> n_count;
 array[N] int <lower = 0> s;
}

parameters {
 vector<lower =0, upper = 1>[N] real theta;
 vector<lower = 0>[max_geography] real alpha;
 vector<lower = 0>[max_geography] real beta;
}

transformed parameters {
 vector[max_geography] real beta_mu;
 vector<lower = 0>[max_geography] real beta_sigma;

 beta_mu = alpha./(alpha+beta);
 beta_sigma = sqrt((alpha.*beta)./((alpha+beta).*(alpha+beta).*(1+alpha+beta)));
}

model {
 alpha ~ exponential(0.1);
 beta ~ exponential(0.1);
 theta ~ beta(2, 3);

 for(n in 1:N) {
  s[n] ~ binomial(n_count[n],theta[n]);
 }
}

generated quantities {
 vector[max_geography] real theta_rep;

 theta_rep = beta_rng(alpha,beta);
}

I believe this should work. You had two definitions for how the theta parameters are determined, so I went with the first one. If you go with the second, I believe you would need to keep it as a for loop because geo is an integer array. In that case:

model {
 alpha ~ exponential(0.1);
 beta ~ exponential(0.1);

 for(n in 1:N) {
  theta[n] ~ beta(alpha[geo[n]],beta[geo[n]]);
  s[n] ~ binomial(n_count[n],theta[n]);
 }
}
1 Like