What is the difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2)

Dear all,

I found there are two ways of writing the inference within the “model” framework. I am wondering is there a difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2)?

I tried both of them by a coincidence but found the result to be different even if I fixed the seed parameter =1.

Can anyone explain what is the difference between those two approaches?

I am just a novice user, so there is surely a lot I do not know about the details of Stan. (I assume you meant to write sampling statement with the normal function and not N.) One difference between the approaches is that y ~ normal(mu, sigma) drops additive constants.

Quoting from the function reference:

y ~normal(mu, sigma)
Increment target log probability density with normal_lpdf(y | mu, sigma) dropping constant additive terms.

How exactly are the results different when you run your model?

1 Like

Hi FJCC, thanks for reply. I guess that might indeed be the reason that I have different log likelihood value, here I am attaching the parameter results

Yes, a difference in the log likelihood would not be a concern.

Yeah @FJCC is correct. The difference is that the so-called “sampling notation” y ~ normal(mu, sigma) drops constants. The target += syntax keeps the constants and is also a lot more flexible, allowing you to use distributions for which we don’t have a built-in function.

If you’re referring to lp__, then just keep in mind it that also includes the (log) priors if you have any, so it isn’t always equivalent to the log-likelihood. Hope that helps.

Thanks for the reply Jonah, how about the inference parameters then, I see most of them are close to each other but not exactly the same. Is this something that can be explained by the normalization constant as well? When I ran the two inference, I used the same prior, same initialization point and same seed.

1 Like

Good question. They should definitely give the same inferences for the parameters (within monte carlo error), but I also would have thought you’d get exactly the same values if, like you say, you also used the same inits and same seed. But I did a little experiment and I also found slight differences, so I think I’m also confused about something!

Maybe @bbbales2 or @rok_cesnovar knows what I’m missing here. In my example below, shouldn’t I get the exact same values of x or is that a faulty assumption?

# Fit two simple models, only differing on whether 
# sampling notation or target+= is used

library("cmdstanr") 

file1 <- write_stan_tempfile(
  "
  parameters {
  real x;
  }
  model {
  x ~ normal(1, 1);
  }
  "
)

file2 <- write_stan_tempfile(
  "
  parameters {
  real x;
  }
  model {
  target += normal_lpdf(x | 1, 1);
  }
  "
)

mod1 <- cmdstan_model(file1)
mod2 <- cmdstan_model(file2)

# initialize to 0 and use same seed
fit1 <- mod1$sample(init = 0, seed = 123)
fit2 <- mod2$sample(init = 0, seed = 123)

I expect different values for lp__ but I also get slightly different results for x, certainly within monte carlo error, but it still surprised me:

> fit1$print("x", "mean", "sd")
 variable mean   sd
        x 1.01 0.99
> fit2$print("x", "mean", "sd")
 variable mean   sd
        x 1.00 1.01

I notice that the number of iterations in @LeonCaesa’s example is only 50, Rhat is almost always far above the recommended 1.01 (is that the latest recommendation?) upper limit, and n_eff is small. I would try running with iter = 2000, if that does not take too long.

1 Like

Yeah that’s true. I can rerun it with longer iteration, but not sure if it will make a difference as I fixed the seeds etc, right?

Yeah, this is probably it. The large Rhat indicates that Stan isn’t mixing and the results are not to be relied upon (so it’s not too surprising you saw differences). You should run it until the Rhats are small before you do comparisons.

Yeah, < 1.01 is the latest.

@bbbales2 But shouldn’t using the same inits and seed still have given the same answer even if unreliable?

Yeah I have the same question, shouldn’t the two inference have the same unreliable results?

1 Like

Yeah you’re right. Count me as surprised.

Well I wouldn’t say they should.

It seems like they would, but they don’t. So there’s probably a reason for this buried somewhere.

I’m still stumped on this. I don’t think there’s a bug necessarily, just probably a lack of understanding on our part. I’m going to open a separate topic to sort this out.

Turns out this is just due to the fact that rounding is impacted by whether the constants are included, which definitely didn’t occur to me but makes total sense:

Thanks that makes perfect sense. I think when the model gets more and more complicated and the difference will be larger and larger and finally brings to two pairs of different but similar parameters

1 Like