Target += normal_lpdf(eta | 0, 1); vs. eta ~ normal(0,1)

I am pretty much a stan newbie, but I have found differences in output between various versions of the schools model depending on whether one specified the models using (e.g.)

target += normal_lpdf(eta | 0, 1);
vs.
eta ~ normal(0,1)

The full code from the rstan::rstan vignette is:

data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
vector[J] eta;
}
transformed parameters {
vector[J] theta;
theta = mu + tau * eta;
}
model {
target += normal_lpdf(eta | 0, 1);
target += normal_lpdf(y | theta, sigma);
}

Prior versions are the same except for the final model section:

model{
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
}

I invoked the two versions using the following stan function call, including a seed value to keep the starting points consistent:

fit ← stan(‘schools.stan’, data = schools_data, seed = 1234)

The values for mu using the version using the "target += normal_lpdf() function are:

mu 7.90 0.12 5.25 -2.35 4.53 7.94 11.23 18.85 1855 1

where as using the “eta ~ normal(0,1)” version, they are:

mu 7.79 0.08 4.82 -1.44 4.63 7.73 10.87 17.33 3342 1

These are obviously close, essentially equivalent given the SEMs. I understand from the discussion that the difference is due to inclusion or not of a normalizing constant. The discussion that I have read suggested, however, that one form might be better for comparison of models, and that there was a minor difference in efficiencies of the two versions.

Since I am just (re)learning STAN, I would prefer to learn to use one model form consistently. I anticipate that I will be comparing models, and accuracy in those comparisons is more important to me than speed. But in general I’d like to understand better the differences between the two types of model declaration and when it would be better to use the one or the other.

Thanks in advance to anyone that can clarify this issue for me.
Larry Hunsicker

See this section of the manual for more info: 7.4 Sampling statements | Stan Reference Manual

1 Like

See this section of the manual for more info: [7.4 Sampling statements | Stan Reference Manual

Thanks, andrjohns. I assume, then, that to compare models one would want the exact log probability values for the models, and the explicit incrementing (target += normal_lpdf(eta | 0, 1);) would be needed to get accurate model comparisons. The sampling statement (eta ~ normal(0,1)) might be slightly faster, and certainly more familiar to many statisticians, if model comparison is not anticipated.

Do I now understand things correctly?

It depends what you mean by ‘compare models’. If you’re using things like LOO or Bayes factors then you need the additional constants, but if you’re just interpreting the returned parameter values without any statistical comparison then the sampling statement is fine

Yes. I was thinking of loo. So I’ll use explicit incrementing whenever I intend to do model comparison. I’m not sure that this issue is prominently mentioned to folks that are used to using the sampling statement.

Along the way, it sounded to me as though the sampling statement was equivalent to

target += normal_lupdf(eta | 0,1):

Looking further in the Stan Functions Reference, I confirmed this.

16.1.5 Sampling statement

y ~ std_normal ()
Increment target log probability density with std_normal_lupdf(y)

But when I tried to run the model with lupdf(y) in place of lpdf(y), the model compiler balked with the error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in ‘model435452bf44f0_schools3’ at line 19, column 24

17: }
18: model{
19:   target += normal_lupdf(eta | 0, 1);
                           ^
20:   target += normal_lupdf(y | theta, sigma);

PARSER EXPECTED: “(”
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘schools3’ due to the above error.

What am I missing here?

Another related question. While the parameter estimates for the two models are quite similar, the effective sample sizes are quite different (explicit incrementing: 1855; sampling: 3342. I guess that the reduced effective sample size for the explicit incrementing relates to uncertainties in the normalizing constant. The difference in effective sample size would be quite important for powering a new trial. Or does this only relate to how many draws one needs to get “convergence”? How should I understand this difference?

That error is because the _lupdf extensions aren’t yet available in rstan (it’s a few versions behind).

The effective sample size is driven more by your model parameterisation than by your actual sample size. ESS essentially tells you how many independent draws would give you the same amount of posterior accuracy as the current draws. So if you have highly auto-correlated samples from the posterior, then they won’t be providing very much ‘new’ information compared to independent samples. Consequently, you’ll have a low effective sample size.

This means that the lower effective sample size for the ‘explicit incrementing’ indicates a higher level of autocorrelation between iterations, rather than anything to do with the (observed) sample size itself.

@avehtari does that all sound right? Is the difference in ESS between the models with and without normalising constants expected?

1 Like

Thanks, andrjohns.
I’m puzzled by the difference in effective sample size, given that the only difference between the two models is the use of explicit updating vs sampling statements. The parameterization is otherwise identical. Does adding in the normalizing constant affect the autocorrelation of the draws?

I’ll see whether aki appends a comment.

You don’t need to use normalized sampling statements for loo; you just need to ensure that you include the normalizing constants when you compute the pointwise log likelihood (usually in generated quantities). See here for more:

You do need to use normalized sampling statements if you want to work with Bayes factors, however.

This is very surprising to me. Can you provide a reproducible example? Can you confirm that the difference is consistent across runs, and that the runs are using the same number of chains run for the same number of iterations?

1 Like

btw. you can install rstan version 2.26.2 (using Stan version 2.26.1, which includes e.g. _lupdf) from https://mc-stan.org/r-packages/ with command

install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
2 Likes

The update fixed my problems. Now the effective sample size using either explicit updates with normal_ lupdf are identical, and quite close to explicit updates with normal_lpdf.

Thanks, aki and andrew, and thanks also to jacob for the distinction between using loo and working with Bayes factors, and for the pointer to Aki’s discussion of writing Stan for use with the loo package.

Larry Hunsicker

2 Likes