I am very new to STAN/Pystan. I fit a model in pystan using an array of parameters. Now I get to see the mean of the posterior of parameters (I could calculate the median as well). But I wanted to know what is the best-fit values of the parameters (in other words where the probability is max or the -log_prob is min). I am pretty sure that there must be some easy function to obtain the best-fit.
My MWE is given below:
import pystan
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
A0=0.5; A1=1.5; A2=-0.2; A3=-0.008; A4=0.00025
def func(xx):
return A0+A1*xx+A2*xx*xx#+A3*(x**3)+A4*(x**4)
x=10*np.random.rand(100); x=np.sort(x)
fx=func(x);
yerror=np.random.rand(len(x))
y=np.random.normal(fx,scale=yerror)
np.random.seed(101)
model = """
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
vector[N] yerror;
int <lower=0> NP;
}
parameters {
real a[NP];
}
model {
for(i in 1:N){
target+=normal_lpdf(y[i]|a[3] + a[1] * x[i] + a[2] * x[i]*x[i], yerror[i]);
}
}
"""
NP=3;
data = {'N': len(x), 'x': x, 'y': y,'yerror':yerror,'NP':NP}
sm = pystan.StanModel(model_code=model)
fit = sm.sampling(data=data, iter=2000, chains=4, warmup=400, thin=3, seed=101)
print(fit)
(If you want your code to be reproducible you need to call np.random.seed()before calling any other np.random functions.)
For maximum likelihood estimation StanModel objects have optimizing method. Instead of sampling call sm.optimizing(data=data) and you get a dictionary of values.
Is there a reason you would want to use optimized values (frequentist) and not full bayesian? What you get, is samples from the posterior distribution, and you get the posterior mean / median with print(fit) or numerically with fit.summary().
Okay, so everyone is suggesting to use optimize() rather than sampling. But suppose if I have already used sampling, is there a way to get the maximum likelihood from the samples (results) itself? Is not it the point where lp__ is minimum?
I want to have both 68,95% confidence levels as well as the best fit value. Since the data and the number of parameters are quite large, it would a waste of resource if I have to run both sampling and optimize together.
Okay, but I really need MLE along with the sampling. For example, I always want to see if the MLE is included in the 68% contour. After reading your comments it comes to mind that there is no such easy way of getting MLE from sampling. I wanted to confirm if I am correct here.
BTW, in some cases I am finding that pystan.StanModel.optimizing(data=data) is not giving me any sensible result, where the sampling works reasonably well. So I absolutely need the MLE from the sampling result. I wonder why it is not calculated automatically. This should be one of the most important number, at least in Physics.
What do you mean by ânot any sensible resultâ? MLE is not guaranteed to exist so itâs possible that your model just doesnât have MLE. If posterior sampling works fine then MLE should be irrelevant.
You could take the posterior draw with the highest lp__ as âthe closest approximationâ of MLEâbut any posterior draw would be just as representative.
Max lp__ among sampling draws can be very far from max lp__ found by optimization. For example, take 100-dimensional IID standard normal posterior. MLE is all-zeros vector. In a typical posterior draw half the components are positive and half negative but only few near zero. Draws where all components are near zero are extremely rare.
I think there are some mixup between the concepts from frequentist statistics and bayesian statistics. In frequentist world you would be interested in MLE and then calculate Confidence Intervals around it. But in bayesian world you donât try to find optimal value for Likelihood, but you try to sample from the joint posterior distribution (which is located in the typical set).
So after you get samples (which is approximation of the posterior distribution) you can calculate the mean/median/quantile value and their accuracy. The Credibility Interval that you are interested can be calculated with suitable method (e.g. quantiles).
I am seeing that when there are many parameters pystan.StanModel.optimizing does not travel in the parameter space well and hence gives not the correct MLE. On the other hand, sampling captures the information from the whole parameter space. Therefore, the latter should be trusted more in this scenario.
I could not understand the 2nd part of your answer.
âbut any posterior draw would be just as representative.
Max lp__ among sampling draws can be very far from max lp__ found by optimization. For example, take 100-dimensional IID standard normal posterior. MLE is all-zeros vector. In a typical posterior draw half the components are positive and half negative but only few near zero. Draws where all components are near zero are extremely rare.
Sorry for beginner question: would it not make sense to do variation inference then? Since it can be seen as an extension an extension of the expectation-maximization algorithm from MAP estimation (and then MLE estimation)? I am asking because I have never worked with variational inference (I have had no chance), but it is somehow on my agenda.
if they do not travel in parameter space, it is not the problem with MLE rather than with an optimizer. For example, if you would change initial conditions, would it improve your outcome?
Most likely. What I found that the optimizing is not going to âallâ points in large parameter space, and gives MLE at a point on the boundary sometimes. This should not be the case, but it is not surprising since I have no idea how optimize works. It certainly does not do the sampling. Things are really unclear here.
For a beginners point of view, I think there is a huge confusion/misunderstanding between Statistics people and many Physicists. In Physics, the point in the parameter space where the probability (exp (-chi^2/2) for Gaussian noise) bear a huge importance. We generally quote 68% confidence limit around that point. I am very very surprise to see that STAN does not calculate this automatically from the sampling.
Probably it is a good idea to read some textbooks (for example, you may check Richard McElreathâs book on Statistical Rethinking - he also notes about that 98%, 57% CIs reporting culture in one of his chapters). MLE is based on optim function and it searches for the maximization point of the likelihood. It is definitely not about sampling