I have various runs of the same model (and sometimes with one slightly changed parameter) and I want to compare models to get the model that fits my data the best. I was told this can be seen by the lp__ value, which is printed out in the model summary. Now my (stupid) question is, which value is better?
Lets say I got those values (I know, they are really close, but nevertheless), which model would be better?
-17169.14
-17122.16
I think I do not fully understand the meaning of log posteriorâ€¦

You are going to have to understand that to make good use of Stan, regardless of its (in)applicability to model comparison. In short, it is the kernel of the posterior density in log-units, possibly ignoring some or all constants. It is not useful for model comparison. Most people around here prefer to do comparisons involving (functions of) the log-likelihood, which excludes the contributions from the priors and the Jacobians for the transformations from the constrained space to the unconstrained space. See this paper:

Other people prefer to calculate the probability that each of several models is correct, conditional on at least one of them being (very close to) correct. See this paper

for some cautions about that approach, but if you are going to use it then you need the posterior density including the constants. If you have that, then you can use

The original question asked about comparing different runs on the same model, as well as different models.
If you are just comparing different runs, then surely the simple log probability is fine.

Logs are monotonic, so higher density is also higher log density. But what youâ€™re looking for in sampling is to sample around the typical set, not around the mode.

Also, you probably donâ€™t want to use lp__ as your yardstick as it drops constants and includes all the Jacobian adjustments to allow sampling to happen on the unconstrained scale.

Why would you compare different runs on the same model and the same data?
The original question explicitly mentions model comparison. You canâ€™t do that with just lp__ values.

By â€śsame modelâ€ť, we mean the same log density. So there may be some misunderstanding in terminology here.

Oh that might be the caseâ€¦
Actually, I am quiet confused right now :D

Lets just say, I have the same model code and I am starting some runs with this same model code but with random generated starting parameters, so there is always a slightly different outcome of the different runs.
Now I just want to know, which run was the best, and I read this was possible to answer by looking at the lp__ value, but I am not sure which lp__ value is the better one, a more negative one?

You should just do that with multiple chains in a single run. Or if you do it in CmdStan, then you should be combining the chains running the same model for posterior analysis.

You should not be looking for equal runs of the same model and finding the best. If theyâ€™re not all the same, you have convergence problems.

@Bob_Carpenter Letâ€™s say I wanna do multiple runs with different prior values. For example: (gamma(2,17) , gamma(17,2), gamma(3,17), â€¦ etc etc) to see the difference in the results.

How can I do that with multiple chains in a single run? Can you please illustrate an example if possible ? I am trying to learn how to do this

got it , kernel of the posterior basically omits the normalization constant that does not depend on the parameter. I just did not know the term for this is â€śkernelâ€ť

sorry to revive this topic, but I sort of have the same issue.
I am running a Bayesian ANN, and I am having the issue that I get multiple answers while running the same model on the same data. In other words, I get a very low MC error per chain, but the chains do not converge (i.e. high Rhats). I actually recently made a post about it;

In that post I was forwarded to the following link;

In that link you can see under â€śNeural network: When ordering is not enoughâ€ť, it says;

This means that to identify the model we need have to somehow choose one of those modes, but there is clearly not a â€śbestâ€ť (much higher lp__) mode.

However, in my case, I actually can see that there is a difference in lp between the â€śsolutionsâ€ť. (Sorry if I mix up terminology hereâ€¦)

If I run 8 chains, is there a way to â€śpickâ€ť the best chains automatically based on the lp_, or are there better ways to do it?

(PS: my goal is to mix machine learning with structural time series, hence I use Stan for ANN.)