# Pystan log-probability confusing!

I am using pystan. Shockingly, I am finding a mismatch between log probability calculated by pystan and the same calculated by hand. Suppose I have 300 data points: x, y(x) and y error \sigma_y(x), all three vectors have length 300. There is no correlation whatsoever.
At any point log-probability should be:
ln_p=-0.5*N*\ln(2 \pi)-\sum_i^{N} \ln({\sigma_y}_i)-\chi^2/2, where N=300. I am finding that this value is not matching with the lp_ value given by the pystan fit! The mismatch is small but quite significant. I do not believe it is coming from truncation error. In pystan, I am using

target+=normal_lpdf(y[i]|F[i],yerror[i])


where y[i], F[i], yerror[i] represent data, model and error at the ith data point.

Where am I doing the mistake? Has anyone by hand calculated the log-probability and matched with the pystan value ever?

Edit: Because of the mismatch, the maximum lp_ does not correspond to minimum \chi^2 where the latter is calculated by hand at each sampled point.

Edit: Working example

import matplotlib.pyplot as plt
import pystan
import numpy as np

def function(x,a,b):
return a*x+b

a,b=5.0,1.0
Nd=300
x=np.linspace(0.5,5.5,Nd)
fx=function(x,a,b)
sigfx=np.full(Nd,1.0); #print(sigfx)
y=np.random.normal(fx,sigfx)

plt.errorbar(x,y, yerr=sigfx,c="r")
plt.plot(x,fx,c="b")

model = """
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
vector[N] yerror;
}
parameters {
real<lower=0.0, upper=20> A;
real <lower=0, upper=10> B;
//real sig;

}

/*model {
for(i in 1:N) {
real F;
F=A*x[i]+B;
target+=normal_lpdf(y[i]|F,yerror[i]);
}
}*/

model{
vector[N] F;
F=A*x+B;
target+=normal_lpdf(y|F,yerror);
//y~normal(F,yerror);
}
"""
# Put our data in a dictionary
data = {'N': len(x), 'x': x, 'y': y, 'yerror':sigfx}

# Compile the model
sm = pystan.StanModel(model_code=model)

# Train the model and generate samples
fit = sm.sampling(data=data, iter=2000, chains=4, warmup=400, thin=3, seed=101)
print(fit)

def get_fit_bf(fit):
ex=np.array(fit.extract(permuted=False));
ex_flat=np.concatenate(ex,axis=0)
idx=np.argmax(ex_flat[:,-1])

MLE_pars=ex_flat[idx,:]
return[MLE_pars]

def get_chisq(pars):
rec_fx=function(x,pars[0],pars[1])
chisq=np.sum((y-rec_fx)**2/sigfx**2)
return chisq

print("Finding best-fit from the lnp..")
MLE_pars= get_fit_bf(fit)
print(MLE_pars[0])
Am,Bm,lpm=MLE_pars[0]
chisq_pyst=get_chisq(MLE_pars[0])
print("Chi^2=",chisq_pyst)

lnp=-0.5*Nd*np.log(2.0*np.pi)-np.sum(np.log(sigfx))-chisq_pyst/2.0
print("lnp for these values should be: ",lnp)

1 Like

Can you post the entire model? Do the parameters have an upper/lower limit or something like that?

How do you call logprob?

Given a working example. Yes, I use flat/uniform prior on the parameters.

Could not understand. Anyway, I am getting the pystan logprob from the fit, the last column.

See how lp__ is calculated here: How exactly is lp__ being calculated

The important piece you are missing here is

So if you would not have constraints then lp__ would probably match exactly. But that will obviously not help you fit the model.

Yes, when I remove the prior they match! But, I don’t understand the outcome. When I am using uniform prior, all points within the prior ranges have equal weightage. So why the probability is not tracing the chi^2 (no correlation assumed)? Worse, the maximum of lp_ is not the minimum of \chi^2 among the points it sampled. How to address that?

Edit: I am very much interested in finding the maximum likelihood also (together with the posterior).

I suggest reading the entire thread here How exactly is lp__ being calculated and the chapter of the reference manual https://mc-stan.org/docs/2_24/reference-manual/variable-transforms-chapter.html

1 Like

Just confirm me one thing. Assume I have uniform priors. After the fit, the samples trace the lp_ of the stan or the log prob calculated like above ?

The lp__ samples that you get when fitting in any Stan interface will return the lp__ that Stan computes. That is, including the log absolute Jacobian determinant of the transform for constrained variables.

Still, where is the problem? I have a parameter A with a lower cutoff. Stan transforms it to another function, say B using log. Now the jacobian, if done properly, should take care of this transformation and the inverse transformations; Why it differs slightly from the analytical value (if the jacobian is calculated correctly)? This is like saying your mass changes if I measure in pounds instead of grams!

I am not sure if I can explain it better than other have in the previous thread and the chapters, sorry.

Maybe the code example will help see what happens to lp__ if you have a constrain.

real<lower=0.0, upper=20> A;


is compiled to C++ as

if (jacobian__) {
current_statement__ = 1;
A = stan::math::lub_constrain(A, 0.0, 20, lp__);
} else {
current_statement__ = 1;
A = stan::math::lub_constrain(A, 0.0, 20);
}


If jacobian__ is true, which in this case it is, that will call this function here:

See what increments lp__ in that case.

You can try calling log_prob in pystan directly (see https://pystan.readthedocs.io/en/latest/api.html#pystan.StanFit4model.log_prob) with adjust_transform true and false and observe what is going on.

1 Like

BTW: if I change to:
y~normal(F,yerror);
the lp_ values are completely different. Why? Is not it the same thing as target+=normal_lpdf(...) ?

I would also add that densities are weird and change under parameter transformations (see case studies at https://betanalpha.github.io/writing/ for some good intro into relevant theory). The lp reported by Stan for HMC/NUTS is with respect to unconstrained parameters, because that’s what is needed to make the algorithm work (and hence what you want to look at for diagnostics etc.), it unfortunately cannot at the same time be identical to what you compute by hand on constrained parameters. I understand this may be confusing, but I was assured the math checks out :-)

1 Like

No, the sampling statement (~) only computes up to a constant (proportional to the exact density f).

See Log Probability Increment vs.Sampling Statement in https://mc-stan.org/docs/2_18/reference-manual/sampling-statements-section.html

Mainly this paragraph:

The sampling statement drops all the terms in the log probability function that are constant, whereas the explicit call to normal_lpdf adds all of the terms in the definition of the log normal probability function, including all of the constant normalizing terms. Therefore, the explicit increment form can be used to recreate the exact log probability values for the model. Otherwise, the sampling statement form will be faster if any of the input expressions, y , mu , or sigma , involve only constants, data variables, and transformed data variables.

1 Like

Yes, I understand the probability density P(A) is not same as P(B) after we transform A -> B. But what I do not understand why stan gives me P(B) (if it gives this in the first place), not P(A) [of the original constrained parameter], if it has the jacobian calculated correctly. This is not nontrivial! Not doing this inevitably creates confusion.

P.S. I want to understand one single thing in the STAN/Pystan developments before I die.

Okay. Thanks. But it also has the same mismatch if we have constrained parameters.

If this is true that stan reports probability density with respect to unconstrained parameters, is there a way that I also store probability density w.r.t. the original constrained parameters at each point while sampling? This could be easier for me.

Because B is what Stan samples internally and log_prob uses to Stan’s internal representation. Note that log_prob takes unconstrained parameters so if you have constrained parameter values you have to fit.unconstrain_pars(...) them before calling fit.log_prob(...).

You can get \log P(A) with fit.log_prob(upars, adjust_transform=False)

3 Likes

The documentation is confusing, clashing with older github contents. Can you please elaborate how to call fit.unconstrain_pars(...) correctly? What would be the pars?