Getting different results when using target +=

Hi,

Beginner at Stan here. I am getting different results when I specify my model block in two different ways (version A and version B). Stan code and R code are presented below. Thanks in advance for the community’s help.


Setup

Suppose that I want to fit a hierarchical model with K subgroups such that for i = 1, 2, ... K,

y_i | \theta_i \sim \textrm{Bin}(N_i, \theta_i)

and I have a prior on \phi_i = \textrm{logit}(\theta_i) such that

\phi_i | \mu, \sigma^2\sim N(\mu, \sigma^2)

The hyperparameters \mu and \sigma^2 have hyperpriors

\mu | \sigma^2 \sim N(0, 1000\sigma^2) and \sigma^2 \sim \textrm{Inv-Gamma}(0.001, 0.001)

My goal is to estimate the \theta_i's.


Stan data and parameter blocks

data {
  
  int<lower=0> N; // total number of observations
  int<lower=1> numarms; // number of distinct subgroups
  vector<lower=0, upper=1>[N] outcome; // vector of N 1s and 0s

  // vector of values indicating membership in the K subgroups; 
  // a participant can only be a member of one and only one subgroup
  vector<lower=0>[N] armsindicator; 

}

transformed data {
  
  array[numarms] int  responders; // number of successes per subgroup
  array[numarms] int  Nperarm; // number of people per subgroup
  
  for (i in 1:numarms) {
    
    // temp1 is a temporary vector of 1s and 0s to indicate if an individual is a responder and is in arm i
    // temp2 is a temporary vector of indicators to signal if an individual is in arm i
    array[N] int temp1;
    array[N] int temp2;
    
    for (j in 1:N) {
      
      temp1[j] = (outcome[j]==1 && armsindicator[j]==i);
      temp2[j] = (armsindicator[j]==i);
    }
    
    responders[i] = sum(temp1);
    Nperarm[i] = sum(temp2);
    
  }
}

parameters {
  
  vector[numarms] phi;
  real mu;
  real<lower=0> sigma2;
  
}

transformed parameters {
  
  vector<lower=0, upper=1>[numarms] theta;
  
  for (k in 1:numarms) {
    theta[k] = exp(phi[k])/(1+exp(phi[k]));
  }
}

Model block version A

model {
   
  sigma2 ~ inv_gamma(0.001, 0.001);
  mu ~ normal(0, sqrt(sigma2*1000));
  
  for (n in 1:numarms) {
    
    phi[n] ~ normal(mu, sqrt(sigma2));
    responders[n] ~ binomial(Nperarm[n], theta[n]);

  }
}

Results for version A

2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved.

Warmup took (0.011, 0.025) seconds, 0.036 seconds total
Sampling took (0.033, 0.032) seconds, 0.065 seconds total

                 Mean     MCSE  StdDev        5%       50%    95%  N_Eff  N_Eff/s    R_hat

lp__             -152  1.8e-01     2.7  -1.6e+02  -1.5e+02   -148    245     3769      1.0
accept_stat__    0.83  1.5e-02    0.25      0.16      0.95   1.00    289     4453  1.0e+00
stepsize__       0.19  3.0e-02   0.030      0.16      0.22   0.22    1.0       15  6.1e+13
treedepth__       3.3  3.2e-02    0.82       2.0       3.0    4.0    667    10264  1.0e+00
n_leapfrog__       15  3.6e-01     9.2       3.0        15     31    646     9939  1.0e+00
divergent__     0.011  2.3e-03    0.10      0.00      0.00   0.00   2043    31425  1.0e+00
energy__          154  1.9e-01     3.2       150       154    160    276     4241  1.0e+00

phi[1]          -0.89  1.1e-02    0.17  -1.2e+00  -8.8e-01  -0.61    250     3850      1.0
phi[2]          -0.87  9.2e-03    0.16  -1.1e+00  -8.6e-01  -0.60    318     4900      1.0
phi[3]          -0.80  1.0e-02    0.17  -1.1e+00  -8.0e-01  -0.50    283     4359      1.0
mu              -0.85  9.6e-03    0.17  -1.1e+00  -8.4e-01  -0.58    323     4971      1.0
sigma2          0.027  5.3e-03    0.14   7.3e-04   5.5e-03   0.10    665    10227      1.0
theta[1]         0.29  2.2e-03   0.035   2.3e-01   2.9e-01   0.35    249     3826      1.0
theta[2]         0.30  1.9e-03   0.034   2.4e-01   3.0e-01   0.35    312     4807      1.0
theta[3]         0.31  2.1e-03   0.036   2.6e-01   3.1e-01   0.38    291     4471      1.0

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

Model block version B

model {
  
  for (n in 1:numarms) {
    
    target += inv_gamma_lpdf(sigma2 | 0.001, 0.001) + 
      normal_lpdf(mu | 0, sqrt(sigma2*1000)) +
      normal_lpdf(phi[n] | mu, sqrt(sigma2)) +
      binomial_lpmf(responders[n] | Nperarm[n], theta[n]);
      
  }
}

Results for version B

2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved.

Warmup took (0.015, 0.015) seconds, 0.030 seconds total
Sampling took (0.037, 0.039) seconds, 0.076 seconds total

                    Mean     MCSE   StdDev        5%       50%       95%  N_Eff  N_Eff/s    R_hat

lp__            -1.7e+01  1.1e-01  2.1e+00  -2.1e+01  -1.7e+01  -1.5e+01    393     5172      1.0
accept_stat__       0.90  4.5e-03  1.4e-01      0.59      0.95       1.0    977    12861  1.0e+00
stepsize__          0.14  7.4e-03  7.4e-03      0.13      0.15      0.15    1.0       13  4.7e+13
treedepth__          3.6  2.3e-02  9.3e-01       2.0       4.0       5.0   1688    22205  1.0e+00
n_leapfrog__          20  3.3e-01  1.3e+01       3.0        15        47   1705    22434  1.0e+00
divergent__         0.00      nan  0.0e+00      0.00      0.00      0.00    nan      nan      nan
energy__              20  1.4e-01  2.7e+00        16        20        25    391     5142  1.0e+00

phi[1]          -8.1e-01  6.8e-03  1.4e-01  -1.0e+00  -8.1e-01  -5.7e-01    446     5871      1.0
phi[2]          -8.0e-01  6.8e-03  1.4e-01  -1.0e+00  -8.1e-01  -5.7e-01    444     5843      1.0
phi[3]          -7.9e-01  6.7e-03  1.4e-01  -1.0e+00  -7.9e-01  -5.6e-01    443     5824      1.0
mu              -8.0e-01  6.8e-03  1.4e-01  -1.0e+00  -8.0e-01  -5.7e-01    427     5619      1.0
sigma2           1.7e-03  6.8e-05  1.4e-03   5.7e-04   1.3e-03   3.9e-03    452     5950      1.0
theta[1]         3.1e-01  1.5e-03  3.1e-02   2.6e-01   3.1e-01   3.6e-01    443     5829      1.0
theta[2]         3.1e-01  1.5e-03  3.1e-02   2.6e-01   3.1e-01   3.6e-01    441     5804      1.0
theta[3]         3.1e-01  1.4e-03  3.0e-02   2.6e-01   3.1e-01   3.6e-01    439     5781      1.0

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

R code to simulate data

set.seed(1234)

narms <- 3
nptperarm <- 85
theta.null <- 0.35

vec.outcome.y <- rbinom(narms * nptperarm, 1, theta.null)
vec.arm.indicator <- sort(rep(seq(1,narms,1), nptperarm))

list.data.bhm <- list(N = length(vec.arm.indicator), 
                      numarms = length(unique(vec.arm.indicator)), 
                      outcome = vec.outcome.y, 
                      armsindicator = vec.arm.indicator)

You have sigma2 and mu in the loop in v2 but not v1.

2 Likes

Ohhhhhh!

I’ve rerun it now (below). There are some slight numerical differences to version A’s results. Should I be expecting an exact replication, given same seed?

model {
  
  target +=inv_gamma_lpdf(sigma2 | 0.001, 0.001) + 
      normal_lpdf(mu | 0, sqrt(sigma2*1000));
      
  for (i in 1:numarms) {
    target += 
      normal_lpdf(phi[i] | mu, sqrt(sigma2)) +
      binomial_lpmf(responders[i] | Nperarm[i], theta[i]);
      
  }
}

New results

2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved.

Warmup took (0.012, 0.012) seconds, 0.024 seconds total
Sampling took (0.028, 0.037) seconds, 0.065 seconds total

                 Mean     MCSE  StdDev        5%       50%    95%  N_Eff  N_Eff/s    R_hat

lp__              -14  1.9e-01     2.6  -1.9e+01  -1.4e+01    -11    190     2918      1.0
accept_stat__    0.81  1.0e-01    0.27      0.12      0.94   1.00    6.8      104  1.1e+00
stepsize__       0.17  5.7e-02   0.057      0.11      0.23   0.23    1.0       15  1.2e+14
treedepth__       3.4  4.2e-01    0.90       2.0       3.0    5.0    4.6       71  1.1e+00
n_leapfrog__       16  5.2e+00      11       3.0        15     31    4.5       69  1.1e+00
divergent__     0.018  4.3e-03    0.13      0.00      0.00   0.00    939    14447      nan
energy__           17  2.1e-01     3.1        12        16     22    212     3264  1.0e+00

phi[1]          -0.88  1.0e-02    0.16  -1.2e+00  -8.7e-01  -0.65    254     3903      1.0
phi[2]          -0.86  9.4e-03    0.16  -1.1e+00  -8.5e-01  -0.62    291     4475      1.0
phi[3]          -0.80  9.2e-03    0.17  -1.1e+00  -7.9e-01  -0.52    325     4997      1.0
mu              -0.85  9.4e-03    0.16  -1.1e+00  -8.4e-01  -0.62    288     4424      1.0
sigma2          0.025  4.8e-03    0.16   8.4e-04   4.2e-03  0.089   1045    16080      1.0
theta[1]         0.29  2.1e-03   0.033   2.4e-01   3.0e-01   0.34    255     3921      1.0
theta[2]         0.30  1.9e-03   0.033   2.4e-01   3.0e-01   0.35    293     4514      1.0
theta[3]         0.31  2.0e-03   0.036   2.6e-01   3.1e-01   0.37    330     5073      1.0

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

x ~ foo(...) is equivalent to target += foo_lupdf(x|...), not lpdf. These should be statistically the same, but if you want exact reproducibility between the two I would try the unnormalized version

2 Likes