How to get best fit value of the parameters from pystan.StanModel.sampling result?

deltasata · November 20, 2019, 8:19am

I am very new to STAN/Pystan. I fit a model in pystan using an array of parameters. Now I get to see the mean of the posterior of parameters (I could calculate the median as well). But I wanted to know what is the best-fit values of the parameters (in other words where the probability is max or the -log_prob is min). I am pretty sure that there must be some easy function to obtain the best-fit.
My MWE is given below:

import pystan
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

A0=0.5; A1=1.5; A2=-0.2; A3=-0.008; A4=0.00025
def func(xx):
    return A0+A1*xx+A2*xx*xx#+A3*(x**3)+A4*(x**4)

x=10*np.random.rand(100); x=np.sort(x)
fx=func(x);
yerror=np.random.rand(len(x))
y=np.random.normal(fx,scale=yerror)

np.random.seed(101)

model = """
data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
    vector[N] yerror;
    int <lower=0> NP;
}
parameters {
    real a[NP];
}
model {
    for(i in 1:N){
    target+=normal_lpdf(y[i]|a[3] + a[1] * x[i] + a[2] * x[i]*x[i], yerror[i]);
    
    }
    
}
"""


NP=3;
data = {'N': len(x), 'x': x, 'y': y,'yerror':yerror,'NP':NP}
sm = pystan.StanModel(model_code=model)
fit = sm.sampling(data=data, iter=2000, chains=4, warmup=400, thin=3, seed=101)
print(fit)

I want to get the best-fit for a[3].

Thanks in advance.

nhuurre · November 20, 2019, 10:55am

(If you want your code to be reproducible you need to call np.random.seed() before calling any other np.random functions.)

For maximum likelihood estimation StanModel objects have optimizing method. Instead of sampling call sm.optimizing(data=data) and you get a dictionary of values.

ahartikainen · November 20, 2019, 1:31pm

Is there a reason you would want to use optimized values (frequentist) and not full bayesian? What you get, is samples from the posterior distribution, and you get the posterior mean / median with print(fit) or numerically with fit.summary().

See here for an example of bayesian workflow: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

aakhmetz · November 21, 2019, 3:40am

Yes, but priors should be flat for MLE, shouldn’t be?

ahartikainen · November 21, 2019, 4:31am

For MLE, yes, but for penalized MLE no.

deltasata · November 21, 2019, 8:23am

Okay, so everyone is suggesting to use optimize() rather than sampling. But suppose if I have already used sampling, is there a way to get the maximum likelihood from the samples (results) itself? Is not it the point where lp__ is minimum?

I want to have both 68,95% confidence levels as well as the best fit value. Since the data and the number of parameters are quite large, it would a waste of resource if I have to run both sampling and optimize together.

deltasata · November 21, 2019, 8:58am

Thank you.

ahartikainen · November 21, 2019, 9:33am

In my opinion you should use .sampling and not try to find MLE.

deltasata · November 22, 2019, 1:26am

Okay, but I really need MLE along with the sampling. For example, I always want to see if the MLE is included in the 68% contour. After reading your comments it comes to mind that there is no such easy way of getting MLE from sampling. I wanted to confirm if I am correct here.

deltasata · November 22, 2019, 9:16am

BTW, in some cases I am finding that pystan.StanModel.optimizing(data=data) is not giving me any sensible result, where the sampling works reasonably well. So I absolutely need the MLE from the sampling result. I wonder why it is not calculated automatically. This should be one of the most important number, at least in Physics.

nhuurre · November 22, 2019, 10:09am

What do you mean by “not any sensible result”? MLE is not guaranteed to exist so it’s possible that your model just doesn’t have MLE. If posterior sampling works fine then MLE should be irrelevant.

You could take the posterior draw with the highest lp__ as “the closest approximation” of MLE–but any posterior draw would be just as representative.
Max lp__ among sampling draws can be very far from max lp__ found by optimization. For example, take 100-dimensional IID standard normal posterior. MLE is all-zeros vector. In a typical posterior draw half the components are positive and half negative but only few near zero. Draws where all components are near zero are extremely rare.

ahartikainen · November 22, 2019, 12:50pm

I think there are some mixup between the concepts from frequentist statistics and bayesian statistics. In frequentist world you would be interested in MLE and then calculate Confidence Intervals around it. But in bayesian world you don’t try to find optimal value for Likelihood, but you try to sample from the joint posterior distribution (which is located in the typical set).

So after you get samples (which is approximation of the posterior distribution) you can calculate the mean/median/quantile value and their accuracy. The Credibility Interval that you are interested can be calculated with suitable method (e.g. quantiles).

See

github.com

betanalpha/jupyter_case_studies/blob/master/principled_bayesian_workflow/principled_bayesian_workflow.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Towards A Principled Bayesian Workflow\n",
    "_*Michael Betancourt*_<br>\n",
    "_*October 2018*_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given a probabilistic model and a particular observation, Bayesian inference is straightforward to implement.  Inferences, and any decisions based upon them, follow immediately in the form of expectations with respect to the corresponding posterior distribution.  Building a probabilistic model that is useful in a given application, however, is a far more open-ended challenge.  Unfortunately the process of model building is often ignored in introductory texts, leaving practitioners to piece together their own model building workflow from potentially incomplete or even inconsistent heuristics.\n",
    "\n",
    "In this case study I introduce a _principled_ workflow for building\n",
    "and evaluating probabilistic models in Bayesian inference.  This workflow is\n",
    "based on recent research in collaboration with Dan Simpson, Aki Vehtari, Sean\n",

This file has been truncated. show original

https://betanalpha.github.io/assets/case_studies/probability_theory.html

deltasata · November 24, 2019, 10:19am

I am seeing that when there are many parameters pystan.StanModel.optimizing does not travel in the parameter space well and hence gives not the correct MLE. On the other hand, sampling captures the information from the whole parameter space. Therefore, the latter should be trusted more in this scenario.

I could not understand the 2nd part of your answer.

–but any posterior draw would be just as representative.
Max lp__ among sampling draws can be very far from max lp__ found by optimization. For example, take 100-dimensional IID standard normal posterior. MLE is all-zeros vector. In a typical posterior draw half the components are positive and half negative but only few near zero. Draws where all components are near zero are extremely rare.

aakhmetz · November 24, 2019, 3:48pm

Sorry for beginner question: would it not make sense to do variation inference then? Since it can be seen as an extension an extension of the expectation-maximization algorithm from MAP estimation (and then MLE estimation)? I am asking because I have never worked with variational inference (I have had no chance), but it is somehow on my agenda.

Thank you for clarifying

aakhmetz · November 24, 2019, 3:51pm

if they do not travel in parameter space, it is not the problem with MLE rather than with an optimizer. For example, if you would change initial conditions, would it improve your outcome?

ahartikainen · November 24, 2019, 4:21pm

No and yes. It depends.

I would say (and this is my opinion):

Always use dynamic hmc (nuts), if it doesn’t work, find out why

:: large data regime --> Use variational inference if nuts with “optimal” code doesn’t work (too slow or some other reason)

:: large large data --> use optimization and other diagnostic stuff

And it all depends what are you trying to do and how complex your model is.

edit. Also VB and optimization don’t look for the same thing

deltasata · November 25, 2019, 5:59am

Most likely. What I found that the optimizing is not going to ‘all’ points in large parameter space, and gives MLE at a point on the boundary sometimes. This should not be the case, but it is not surprising since I have no idea how optimize works. It certainly does not do the sampling. Things are really unclear here.

deltasata · November 25, 2019, 6:02am

For a beginners point of view, I think there is a huge confusion/misunderstanding between Statistics people and many Physicists. In Physics, the point in the parameter space where the probability (exp (-chi^2/2) for Gaussian noise) bear a huge importance. We generally quote 68% confidence limit around that point. I am very very surprise to see that STAN does not calculate this automatically from the sampling.

aakhmetz · November 25, 2019, 6:03am

Probably it is a good idea to read some textbooks (for example, you may check Richard McElreath’s book on Statistical Rethinking - he also notes about that 98%, 57% CIs reporting culture in one of his chapters). MLE is based on optim function and it searches for the maximization point of the likelihood. It is definitely not about sampling

aakhmetz · November 25, 2019, 6:06am

Thanks a lot for clarification - that was also a bit my feeling about VB.

Topic		Replies	Views
Using a custom likelihood function in pystan Modeling fitting-issues	6	1547	January 24, 2020
Pystan sampling from the posterior predictive PyStan	8	3073	October 30, 2020
How to include data measurement uncertainty in stan/pystan Modeling techniques	7	913	November 13, 2019
Pystan2 to pystan3 Modeling	0	192	May 27, 2024
Trouble restricting samples to positive values Modeling techniques , fitting-issues	2	581	June 27, 2020

How to get best fit value of the parameters from pystan.StanModel.sampling result?

Related topics