Differences between ~normal() and target += normal_lpdf()

mespe · March 9, 2018, 12:50am

Hi - I am writing up some examples, and am encountering some strange behavior. I have the following model straight from the manual:

data{
  int N;
  int K;
  matrix[N, K] X;
  vector[N] y;
}

parameters{
  real alpha;
  vector[K] beta;
  real<lower = 0> sigma;
}

model{
  y ~ normal(X * beta + alpha, sigma);
}

This give the following estimates:

          mean se_mean   sd     5%    95% n_eff Rhat
alpha    40.92    0.10 2.84  36.38  45.59   841    1
beta[1]  -1.30    0.02 0.68  -2.40  -0.17   913    1
beta[2]   0.01    0.00 0.01  -0.01   0.03   791    1
beta[3]  -0.02    0.00 0.01  -0.04   0.00  1115    1
beta[4]  -3.88    0.03 1.05  -5.60  -2.08   924    1
sigma     2.62    0.01 0.38   2.08   3.31   847    1
lp__    -45.22    0.08 1.95 -48.83 -42.81   539    1

Changing the likelihood line to:

target += normal_lpdf(y | X * beta + alpha, sigma);

Results in these estimates:

           mean se_mean   sd     5%    95% n_eff Rhat
alpha     33.78    0.11 2.97  28.55  38.40   668 1.01
betas[1]  -0.29    0.02 0.71  -1.42   0.90   980 1.00
betas[2]  -0.01    0.00 0.01  -0.04   0.01   764 1.00
betas[3]  -0.02    0.00 0.01  -0.04   0.00  1178 1.00
betas[4]  -1.87    0.04 1.05  -3.48  -0.05   834 1.00
sigma      2.91    0.02 0.48   2.24   3.81   709 1.00
lp__     -91.35    0.09 2.03 -95.39 -88.80   562 1.00

The data are the mtcars dataset in R, with the model mpg ~ cyl + disp + hp + wt.

This happens in both rstan and cmdstan. Any help appreciated.

wds15 · March 9, 2018, 9:21am

They both result in the same density wrt to the sampling. The difference is that normal_lpdf returns the correctly normalized density. It is thus a tiny bit slower as the normalizing constants are being calculated each time. The ~ normal notation does drop the normalizing constants.

So within MC error your results should match (and they do as it looks like).

funtusov · March 9, 2018, 3:35pm

From the example above, if the mc standard error is under se_mean, it looks like the alpha's are quite significantly different, with a difference of 7 and a mc error of 0.1?

mespe · March 9, 2018, 3:43pm

OK - I thought I was misunderstanding the differences between these two, but given your response this might be a bug.

If you look closely, the two samples do not match. The mean for beta[4] in the first one is outside of the interval for the second. The mean for beta[1] on the first is very close to the edge of the interval for the second. With an effective size ~800, there should not be this large of a difference.

For reference, the ~normal() notation gives the same results as rstanarm::stan_glm().

My setup:

Linux 4.15.7
gcc 7.3.0
cmdstan 2.17.1
R 3.4.3
rstan 2.17.3

sakrejda · March 9, 2018, 4:41pm

@mespe Can I convince you to put a minimal reproducible example in the relevant repo (whatever interface you’re running)?

mespe · March 9, 2018, 4:47pm

I would be happy to, though I observed it in multiple interfaces (cmdstan and rstan). Should it go to https://github.com/stan-dev/stan?

sakrejda · March 9, 2018, 4:58pm

Sort of doesn’t matter because it’ll end up getting moved to either stan-dev/stan or stan-dev/math

Somebody just needs to run multiple chains, confirm that data is passed in correctly, confirm that the two are different, confirm that it’s not a mixing problem from the model, check what c++ is generated, etc…

if there are actually problems with the normal log density we should hand out prizes for finding them :)

mespe · March 9, 2018, 5:00pm

Issue opened on github for stan. Just let me know if you need additional pieces of information.

wds15 · March 10, 2018, 9:50am

Oh… you are right. I did not look that closely. Have you

Run the above with different seeds?
Rerun the thing with at least weakly informative priors?
Can you check the sampler diagnostics? (stepsize, masses)… if these terms don’t mean too much to you, don’t mind.

I would suspect that there is something odd going on wrt to the warmup and no priors are not a good thing - but you are right in that results should not differ that much.

mespe · March 10, 2018, 3:58pm

I have tried with and without priors of varying strength, different starting seeds, different number of iterations. I pulled the priors off the example model to make it the bare minimum model necessary to see the issue.

On my machine, it is pretty consistent. That is why I thought I was not understanding the difference between these two fully.

I can put together a more comprehensive test case if that is helpful.

mespe · March 10, 2018, 4:39pm

I just ran this on a different machine and did not experience any differences, but still get those differences on my laptop.

I need to dig into what happened on this laptop to cause this. Any ideas where to start?

wds15 · March 10, 2018, 4:43pm

that’s good news!

You should wipe out all binaries, then check your compiler settings and rerun on your laptop.

mespe · March 11, 2018, 4:44am

After few hours, I think I figured it out. My kernel was updated, but my laptop was not restarted.

A reboot later, both are now the same to MCMC error. Sigh…

Bob_Carpenter · March 13, 2018, 9:27pm

This seems to be a common problem people are running into. Seems like the OS should suggest a restart after updating!

Topic		Replies	Views
What is the difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2) General	17	951	July 26, 2020
Target += normal_lpdf(eta \| 0, 1); vs. eta ~ normal(0,1) Modeling rstan	9	1922	July 22, 2021
Difference between ~ and target+= General specification	1	602	July 28, 2020
Is there an issue with mixing "~" and "target +=" Modeling	6	1032	August 1, 2017
Why exactly is alpha + beta * x ~ normal(y, sigma) wrong? Is it even wrong? Modeling specification	4	968	January 16, 2023

Differences between ~normal() and target += normal_lpdf()

Related topics