What is the difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2)

LeonCaesa · July 23, 2020, 9:16pm

Dear all,

I found there are two ways of writing the inference within the “model” framework. I am wondering is there a difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2)?

I tried both of them by a coincidence but found the result to be different even if I fixed the seed parameter =1.

Can anyone explain what is the difference between those two approaches?

FJCC · July 24, 2020, 8:08pm

I am just a novice user, so there is surely a lot I do not know about the details of Stan. (I assume you meant to write sampling statement with the normal function and not N.) One difference between the approaches is that y ~ normal(mu, sigma) drops additive constants.

Quoting from the function reference:

y ~normal(mu, sigma)
Increment target log probability density with normal_lpdf(y | mu, sigma) dropping constant additive terms.

How exactly are the results different when you run your model?

LeonCaesa · July 24, 2020, 8:17pm

Hi FJCC, thanks for reply. I guess that might indeed be the reason that I have different log likelihood value, here I am attaching the parameter results

FJCC · July 24, 2020, 8:31pm

Yes, a difference in the log likelihood would not be a concern.

jonah · July 24, 2020, 8:40pm

Yeah @FJCC is correct. The difference is that the so-called “sampling notation” y ~ normal(mu, sigma) drops constants. The target += syntax keeps the constants and is also a lot more flexible, allowing you to use distributions for which we don’t have a built-in function.

jonah · July 24, 2020, 8:42pm

If you’re referring to lp__, then just keep in mind it that also includes the (log) priors if you have any, so it isn’t always equivalent to the log-likelihood. Hope that helps.

LeonCaesa · July 24, 2020, 8:43pm

Thanks for the reply Jonah, how about the inference parameters then, I see most of them are close to each other but not exactly the same. Is this something that can be explained by the normalization constant as well? When I ran the two inference, I used the same prior, same initialization point and same seed.

jonah · July 24, 2020, 9:05pm

Good question. They should definitely give the same inferences for the parameters (within monte carlo error), but I also would have thought you’d get exactly the same values if, like you say, you also used the same inits and same seed. But I did a little experiment and I also found slight differences, so I think I’m also confused about something!

Maybe @bbbales2 or @rok_cesnovar knows what I’m missing here. In my example below, shouldn’t I get the exact same values of x or is that a faulty assumption?

# Fit two simple models, only differing on whether 
# sampling notation or target+= is used

library("cmdstanr") 

file1 <- write_stan_tempfile(
  "
  parameters {
  real x;
  }
  model {
  x ~ normal(1, 1);
  }
  "
)

file2 <- write_stan_tempfile(
  "
  parameters {
  real x;
  }
  model {
  target += normal_lpdf(x | 1, 1);
  }
  "
)

mod1 <- cmdstan_model(file1)
mod2 <- cmdstan_model(file2)

# initialize to 0 and use same seed
fit1 <- mod1$sample(init = 0, seed = 123)
fit2 <- mod2$sample(init = 0, seed = 123)

I expect different values for lp__ but I also get slightly different results for x, certainly within monte carlo error, but it still surprised me:

> fit1$print("x", "mean", "sd")
 variable mean   sd
        x 1.01 0.99
> fit2$print("x", "mean", "sd")
 variable mean   sd
        x 1.00 1.01

FJCC · July 24, 2020, 9:11pm

I notice that the number of iterations in @LeonCaesa’s example is only 50, Rhat is almost always far above the recommended 1.01 (is that the latest recommendation?) upper limit, and n_eff is small. I would try running with iter = 2000, if that does not take too long.

LeonCaesa · July 24, 2020, 9:12pm

Yeah that’s true. I can rerun it with longer iteration, but not sure if it will make a difference as I fixed the seeds etc, right?

bbbales2 · July 24, 2020, 9:15pm

Yeah, this is probably it. The large Rhat indicates that Stan isn’t mixing and the results are not to be relied upon (so it’s not too surprising you saw differences). You should run it until the Rhats are small before you do comparisons.

Yeah, < 1.01 is the latest.

jonah · July 24, 2020, 9:31pm

@bbbales2 But shouldn’t using the same inits and seed still have given the same answer even if unreliable?

LeonCaesa · July 24, 2020, 9:37pm

Yeah I have the same question, shouldn’t the two inference have the same unreliable results?

bbbales2 · July 24, 2020, 9:38pm

Yeah you’re right. Count me as surprised.

bbbales2 · July 24, 2020, 9:39pm

Well I wouldn’t say they should.

It seems like they would, but they don’t. So there’s probably a reason for this buried somewhere.

jonah · July 24, 2020, 10:05pm

I’m still stumped on this. I don’t think there’s a bug necessarily, just probably a lack of understanding on our part. I’m going to open a separate topic to sort this out.

jonah · July 24, 2020, 10:28pm

Turns out this is just due to the fact that rounding is impacted by whether the constants are included, which definitely didn’t occur to me but makes total sense:

LeonCaesa · July 26, 2020, 3:56pm

Thanks that makes perfect sense. I think when the model gets more and more complicated and the difference will be larger and larger and finally brings to two pairs of different but similar parameters

Topic		Replies	Views
Target += vs. sampling: does dropping constants preserve sampling behavior? General	5	430	October 13, 2022
Why does dropping constants give different parameter estimates if seed and inits are the same? General specification	3	409	July 24, 2020
Difference between ~ and target+= General specification	1	434	July 28, 2020
Writing log likelihood directly in Stan General	3	1511	November 26, 2020
Yet another lp__ question General	7	634	April 11, 2018

What is the difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2)

Related Topics