Pystan log-probability confusing!

deltasata · August 25, 2020, 5:55am

I am using pystan. Shockingly, I am finding a mismatch between log probability calculated by pystan and the same calculated by hand. Suppose I have 300 data points: x, y(x) and y error \sigma_y(x), all three vectors have length 300. There is no correlation whatsoever.
At any point log-probability should be:
ln_p=-0.5*N*\ln(2 \pi)-\sum_i^{N} \ln({\sigma_y}_i)-\chi^2/2, where N=300. I am finding that this value is not matching with the lp_ value given by the pystan fit! The mismatch is small but quite significant. I do not believe it is coming from truncation error. In pystan, I am using

target+=normal_lpdf(y[i]|F[i],yerror[i])

where y[i], F[i], yerror[i] represent data, model and error at the ith data point.

Where am I doing the mistake? Has anyone by hand calculated the log-probability and matched with the pystan value ever?

Edit: Because of the mismatch, the maximum lp_ does not correspond to minimum \chi^2 where the latter is calculated by hand at each sampled point.

Edit: Working example

import matplotlib.pyplot as plt
import pystan
import numpy as np

def function(x,a,b):
    return a*x+b

a,b=5.0,1.0
Nd=300
x=np.linspace(0.5,5.5,Nd)
fx=function(x,a,b)
sigfx=np.full(Nd,1.0); #print(sigfx)
y=np.random.normal(fx,sigfx)

plt.errorbar(x,y, yerr=sigfx,c="r")
plt.plot(x,fx,c="b")

model = """
data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
    vector[N] yerror;
}
parameters {
    real<lower=0.0, upper=20> A;
    real <lower=0, upper=10> B;
    //real sig;
    
}

/*model {
    for(i in 1:N) {
        real F;
        F=A*x[i]+B;
        target+=normal_lpdf(y[i]|F,yerror[i]);
    }
}*/

model{
        vector[N] F;
        F=A*x+B;
        target+=normal_lpdf(y|F,yerror);
        //y~normal(F,yerror);
}
"""
# Put our data in a dictionary
data = {'N': len(x), 'x': x, 'y': y, 'yerror':sigfx}

# Compile the model
sm = pystan.StanModel(model_code=model)

# Train the model and generate samples
fit = sm.sampling(data=data, iter=2000, chains=4, warmup=400, thin=3, seed=101)
print(fit)


def get_fit_bf(fit):
    ex=np.array(fit.extract(permuted=False));
    ex_flat=np.concatenate(ex,axis=0)
    idx=np.argmax(ex_flat[:,-1])

    MLE_pars=ex_flat[idx,:]
    return[MLE_pars]

def get_chisq(pars):
    rec_fx=function(x,pars[0],pars[1])
    chisq=np.sum((y-rec_fx)**2/sigfx**2)
    return chisq

print("Finding best-fit from the lnp..")
MLE_pars= get_fit_bf(fit)
print(MLE_pars[0])
Am,Bm,lpm=MLE_pars[0]
chisq_pyst=get_chisq(MLE_pars[0])
print("Chi^2=",chisq_pyst)

lnp=-0.5*Nd*np.log(2.0*np.pi)-np.sum(np.log(sigfx))-chisq_pyst/2.0
print("lnp for these values should be: ",lnp)

rok_cesnovar · August 25, 2020, 6:02am

Can you post the entire model? Do the parameters have an upper/lower limit or something like that?

ahartikainen · August 25, 2020, 6:04am

How do you call logprob?

deltasata · August 25, 2020, 6:06am

Given a working example. Yes, I use flat/uniform prior on the parameters.

deltasata · August 25, 2020, 6:08am

Could not understand. Anyway, I am getting the pystan logprob from the fit, the last column.

rok_cesnovar · August 25, 2020, 6:14am

See how lp__ is calculated here: How exactly is lp__ being calculated - #4 by Bob_Carpenter

The important piece you are missing here is

So if you would not have constraints then lp__ would probably match exactly. But that will obviously not help you fit the model.

deltasata · August 25, 2020, 6:25am

Yes, when I remove the prior they match! But, I don’t understand the outcome. When I am using uniform prior, all points within the prior ranges have equal weightage. So why the probability is not tracing the chi^2 (no correlation assumed)? Worse, the maximum of lp_ is not the minimum of \chi^2 among the points it sampled. How to address that?

Edit: I am very much interested in finding the maximum likelihood also (together with the posterior).

rok_cesnovar · August 25, 2020, 6:31am

I suggest reading the entire thread here How exactly is lp__ being calculated and the chapter of the reference manual https://mc-stan.org/docs/2_24/reference-manual/variable-transforms-chapter.html

deltasata · August 25, 2020, 6:48am

Just confirm me one thing. Assume I have uniform priors. After the fit, the samples trace the lp_ of the stan or the log prob calculated like above ?

rok_cesnovar · August 25, 2020, 6:52am

The lp__ samples that you get when fitting in any Stan interface will return the lp__ that Stan computes. That is, including the log absolute Jacobian determinant of the transform for constrained variables.

deltasata · August 25, 2020, 6:57am

Still, where is the problem? I have a parameter A with a lower cutoff. Stan transforms it to another function, say B using log. Now the jacobian, if done properly, should take care of this transformation and the inverse transformations; Why it differs slightly from the analytical value (if the jacobian is calculated correctly)? This is like saying your mass changes if I measure in pounds instead of grams!

rok_cesnovar · August 25, 2020, 7:21am

I am not sure if I can explain it better than other have in the previous thread and the chapters, sorry.

Maybe the code example will help see what happens to lp__ if you have a constrain.

real<lower=0.0, upper=20> A;

is compiled to C++ as

if (jacobian__) {
     current_statement__ = 1;
     A = stan::math::lub_constrain(A, 0.0, 20, lp__);
} else {
     current_statement__ = 1;
     A = stan::math::lub_constrain(A, 0.0, 20);
}

If jacobian__ is true, which in this case it is, that will call this function here:

github.com

stan-dev/math/blob/350c411137712b50f147f1c6e528850b39911db5/stan/math/prim/fun/lub_constrain.hpp#L101


* @tparam U Type of upper bound.
* @param[in] x Free scalar to transform.
* @param[in] lb Lower bound.
* @param[in] ub Upper bound.
* @param[in,out] lp Log probability scalar reference.
* @return Lower- and upper-bounded scalar derived from transforming
*   the free scalar.
* @throw std::domain_error if ub <= lb
*/
template <typename T, typename L, typename U, typename S>
inline return_type_t<T, L, U> lub_constrain(const T& x, const L& lb,
                                           const U& ub, S& lp) {
 using std::exp;
 using std::log;
 check_less("lub_constrain", "lb", lb, ub);
 if (lb == NEGATIVE_INFTY) {
   return ub_constrain(x, ub, lp);
 }
 if (ub == INFTY) {
   return lb_constrain(x, lb, lp);
 }

See what increments lp__ in that case.

You can try calling log_prob in pystan directly (see https://pystan.readthedocs.io/en/latest/api.html#pystan.StanFit4model.log_prob) with adjust_transform true and false and observe what is going on.

deltasata · August 25, 2020, 7:27am

BTW: if I change to:
y~normal(F,yerror);
the lp_ values are completely different. Why? Is not it the same thing as target+=normal_lpdf(...) ?

martinmodrak · August 25, 2020, 7:30am

I would also add that densities are weird and change under parameter transformations (see case studies at https://betanalpha.github.io/writing/ for some good intro into relevant theory). The lp reported by Stan for HMC/NUTS is with respect to unconstrained parameters, because that’s what is needed to make the algorithm work (and hence what you want to look at for diagnostics etc.), it unfortunately cannot at the same time be identical to what you compute by hand on constrained parameters. I understand this may be confusing, but I was assured the math checks out :-)

rok_cesnovar · August 25, 2020, 7:32am

No, the sampling statement (~) only computes up to a constant (proportional to the exact density f).

See Log Probability Increment vs.Sampling Statement in Stan Reference Manual

Mainly this paragraph:

The sampling statement drops all the terms in the log probability function that are constant, whereas the explicit call to normal_lpdf adds all of the terms in the definition of the log normal probability function, including all of the constant normalizing terms. Therefore, the explicit increment form can be used to recreate the exact log probability values for the model. Otherwise, the sampling statement form will be faster if any of the input expressions, y , mu , or sigma , involve only constants, data variables, and transformed data variables.

deltasata · August 25, 2020, 7:36am

Yes, I understand the probability density P(A) is not same as P(B) after we transform A -> B. But what I do not understand why stan gives me P(B) (if it gives this in the first place), not P(A) [of the original constrained parameter], if it has the jacobian calculated correctly. This is not nontrivial! Not doing this inevitably creates confusion.

P.S. I want to understand one single thing in the STAN/Pystan developments before I die.

deltasata · August 25, 2020, 7:43am

Okay. Thanks. But it also has the same mismatch if we have constrained parameters.

deltasata · August 25, 2020, 7:50am

If this is true that stan reports probability density with respect to unconstrained parameters, is there a way that I also store probability density w.r.t. the original constrained parameters at each point while sampling? This could be easier for me.

nhuurre · August 25, 2020, 8:00am

Because B is what Stan samples internally and log_prob uses to Stan’s internal representation. Note that log_prob takes unconstrained parameters so if you have constrained parameter values you have to fit.unconstrain_pars(...) them before calling fit.log_prob(...).

You can get \log P(A) with fit.log_prob(upars, adjust_transform=False)

deltasata · August 25, 2020, 8:30am

The documentation is confusing, clashing with older github contents. Can you please elaborate how to call fit.unconstrain_pars(...) correctly? What would be the pars?

Topic		Replies	Views
The myth of log_prob from Stan model? PyStan	7	964	December 7, 2020
Unexpected log_prob() behavior Developers fitting-issues	5	619	April 20, 2023
Pystan log_prob unexpected returns Developers	2	350	January 27, 2021
Constrained parameters are transformed, which affects the calculation of the log posterior probability lp__ Modeling	2	425	May 19, 2020
How to get best fit value of the parameters from pystan.StanModel.sampling result? Modeling techniques , specification , bayesplot	38	6076	February 22, 2020

Pystan log-probability confusing!

Related topics