Hi,
Beginner at Stan here. I am getting different results when I specify my model block in two different ways (version A and version B). Stan code and R code are presented below. Thanks in advance for the community’s help.
Setup
Suppose that I want to fit a hierarchical model with K subgroups such that for i = 1, 2, ... K,
y_i | \theta_i \sim \textrm{Bin}(N_i, \theta_i)
and I have a prior on \phi_i = \textrm{logit}(\theta_i) such that
\phi_i | \mu, \sigma^2\sim N(\mu, \sigma^2)
The hyperparameters \mu and \sigma^2 have hyperpriors
\mu | \sigma^2 \sim N(0, 1000\sigma^2) and \sigma^2 \sim \textrm{Inv-Gamma}(0.001, 0.001)
My goal is to estimate the \theta_i's.
Stan data and parameter blocks
data {
int<lower=0> N; // total number of observations
int<lower=1> numarms; // number of distinct subgroups
vector<lower=0, upper=1>[N] outcome; // vector of N 1s and 0s
// vector of values indicating membership in the K subgroups;
// a participant can only be a member of one and only one subgroup
vector<lower=0>[N] armsindicator;
}
transformed data {
array[numarms] int responders; // number of successes per subgroup
array[numarms] int Nperarm; // number of people per subgroup
for (i in 1:numarms) {
// temp1 is a temporary vector of 1s and 0s to indicate if an individual is a responder and is in arm i
// temp2 is a temporary vector of indicators to signal if an individual is in arm i
array[N] int temp1;
array[N] int temp2;
for (j in 1:N) {
temp1[j] = (outcome[j]==1 && armsindicator[j]==i);
temp2[j] = (armsindicator[j]==i);
}
responders[i] = sum(temp1);
Nperarm[i] = sum(temp2);
}
}
parameters {
vector[numarms] phi;
real mu;
real<lower=0> sigma2;
}
transformed parameters {
vector<lower=0, upper=1>[numarms] theta;
for (k in 1:numarms) {
theta[k] = exp(phi[k])/(1+exp(phi[k]));
}
}
Model block version A
model {
sigma2 ~ inv_gamma(0.001, 0.001);
mu ~ normal(0, sqrt(sigma2*1000));
for (n in 1:numarms) {
phi[n] ~ normal(mu, sqrt(sigma2));
responders[n] ~ binomial(Nperarm[n], theta[n]);
}
}
Results for version A
2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved.
Warmup took (0.011, 0.025) seconds, 0.036 seconds total
Sampling took (0.033, 0.032) seconds, 0.065 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -152 1.8e-01 2.7 -1.6e+02 -1.5e+02 -148 245 3769 1.0
accept_stat__ 0.83 1.5e-02 0.25 0.16 0.95 1.00 289 4453 1.0e+00
stepsize__ 0.19 3.0e-02 0.030 0.16 0.22 0.22 1.0 15 6.1e+13
treedepth__ 3.3 3.2e-02 0.82 2.0 3.0 4.0 667 10264 1.0e+00
n_leapfrog__ 15 3.6e-01 9.2 3.0 15 31 646 9939 1.0e+00
divergent__ 0.011 2.3e-03 0.10 0.00 0.00 0.00 2043 31425 1.0e+00
energy__ 154 1.9e-01 3.2 150 154 160 276 4241 1.0e+00
phi[1] -0.89 1.1e-02 0.17 -1.2e+00 -8.8e-01 -0.61 250 3850 1.0
phi[2] -0.87 9.2e-03 0.16 -1.1e+00 -8.6e-01 -0.60 318 4900 1.0
phi[3] -0.80 1.0e-02 0.17 -1.1e+00 -8.0e-01 -0.50 283 4359 1.0
mu -0.85 9.6e-03 0.17 -1.1e+00 -8.4e-01 -0.58 323 4971 1.0
sigma2 0.027 5.3e-03 0.14 7.3e-04 5.5e-03 0.10 665 10227 1.0
theta[1] 0.29 2.2e-03 0.035 2.3e-01 2.9e-01 0.35 249 3826 1.0
theta[2] 0.30 1.9e-03 0.034 2.4e-01 3.0e-01 0.35 312 4807 1.0
theta[3] 0.31 2.1e-03 0.036 2.6e-01 3.1e-01 0.38 291 4471 1.0
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
Model block version B
model {
for (n in 1:numarms) {
target += inv_gamma_lpdf(sigma2 | 0.001, 0.001) +
normal_lpdf(mu | 0, sqrt(sigma2*1000)) +
normal_lpdf(phi[n] | mu, sqrt(sigma2)) +
binomial_lpmf(responders[n] | Nperarm[n], theta[n]);
}
}
Results for version B
2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved.
Warmup took (0.015, 0.015) seconds, 0.030 seconds total
Sampling took (0.037, 0.039) seconds, 0.076 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -1.7e+01 1.1e-01 2.1e+00 -2.1e+01 -1.7e+01 -1.5e+01 393 5172 1.0
accept_stat__ 0.90 4.5e-03 1.4e-01 0.59 0.95 1.0 977 12861 1.0e+00
stepsize__ 0.14 7.4e-03 7.4e-03 0.13 0.15 0.15 1.0 13 4.7e+13
treedepth__ 3.6 2.3e-02 9.3e-01 2.0 4.0 5.0 1688 22205 1.0e+00
n_leapfrog__ 20 3.3e-01 1.3e+01 3.0 15 47 1705 22434 1.0e+00
divergent__ 0.00 nan 0.0e+00 0.00 0.00 0.00 nan nan nan
energy__ 20 1.4e-01 2.7e+00 16 20 25 391 5142 1.0e+00
phi[1] -8.1e-01 6.8e-03 1.4e-01 -1.0e+00 -8.1e-01 -5.7e-01 446 5871 1.0
phi[2] -8.0e-01 6.8e-03 1.4e-01 -1.0e+00 -8.1e-01 -5.7e-01 444 5843 1.0
phi[3] -7.9e-01 6.7e-03 1.4e-01 -1.0e+00 -7.9e-01 -5.6e-01 443 5824 1.0
mu -8.0e-01 6.8e-03 1.4e-01 -1.0e+00 -8.0e-01 -5.7e-01 427 5619 1.0
sigma2 1.7e-03 6.8e-05 1.4e-03 5.7e-04 1.3e-03 3.9e-03 452 5950 1.0
theta[1] 3.1e-01 1.5e-03 3.1e-02 2.6e-01 3.1e-01 3.6e-01 443 5829 1.0
theta[2] 3.1e-01 1.5e-03 3.1e-02 2.6e-01 3.1e-01 3.6e-01 441 5804 1.0
theta[3] 3.1e-01 1.4e-03 3.0e-02 2.6e-01 3.1e-01 3.6e-01 439 5781 1.0
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
R code to simulate data
set.seed(1234)
narms <- 3
nptperarm <- 85
theta.null <- 0.35
vec.outcome.y <- rbinom(narms * nptperarm, 1, theta.null)
vec.arm.indicator <- sort(rep(seq(1,narms,1), nptperarm))
list.data.bhm <- list(N = length(vec.arm.indicator),
numarms = length(unique(vec.arm.indicator)),
outcome = vec.outcome.y,
armsindicator = vec.arm.indicator)