Model comparison - lp__



I have various runs of the same model (and sometimes with one slightly changed parameter) and I want to compare models to get the model that fits my data the best. I was told this can be seen by the lp__ value, which is printed out in the model summary. Now my (stupid) question is, which value is better?
Lets say I got those values (I know, they are really close, but nevertheless), which model would be better?
I think I do not fully understand the meaning of log posterior…

Help would be very appreciated!


You are going to have to understand that to make good use of Stan, regardless of its (in)applicability to model comparison. In short, it is the kernel of the posterior density in log-units, possibly ignoring some or all constants. It is not useful for model comparison. Most people around here prefer to do comparisons involving (functions of) the log-likelihood, which excludes the contributions from the priors and the Jacobians for the transformations from the constrained space to the unconstrained space. See this paper:

Other people prefer to calculate the probability that each of several models is correct, conditional on at least one of them being (very close to) correct. See this paper

for some cautions about that approach, but if you are going to use it then you need the posterior density including the constants. If you have that, then you can use
See especially its tutorial.


The original question asked about comparing different runs on the same model, as well as different models.
If you are just comparing different runs, then surely the simple log probability is fine.


Thank you. And do you know if a smaller value is better or a larger value?


Logs are monotonic, so higher density is also higher log density. But what you’re looking for in sampling is to sample around the typical set, not around the mode.

Also, you probably don’t want to use lp__ as your yardstick as it drops constants and includes all the Jacobian adjustments to allow sampling to happen on the unconstrained scale.


Why would you compare different runs on the same model and the same data?
The original question explicitly mentions model comparison. You can’t do that with just lp__ values.


By “same model”, we mean the same log density. So there may be some misunderstanding in terminology here.


By “same model”, we mean the same log density. So there may be some misunderstanding in terminology here.

Oh that might be the case…
Actually, I am quiet confused right now :D

Lets just say, I have the same model code and I am starting some runs with this same model code but with random generated starting parameters, so there is always a slightly different outcome of the different runs.
Now I just want to know, which run was the best, and I read this was possible to answer by looking at the lp__ value, but I am not sure which lp__ value is the better one, a more negative one?


You should just do that with multiple chains in a single run. Or if you do it in CmdStan, then you should be combining the chains running the same model for posterior analysis.

You should not be looking for equal runs of the same model and finding the best. If they’re not all the same, you have convergence problems.