Problems with posteriors of parameters in Bayesian hierarchical model (cmdstanpy)

Simon_Smith · August 7, 2023, 12:16pm

Hi there,

I’m having a problem getting the modes of my posterior probability distributions lining up to what I’d expect them to be, given the data I am learning from. I am using the package cmdstanpy.

I have a generated dataset shown in the x/y plot below:

Here, y is generated according to y = 2x + 1, with some additional Gaussian noise.
x is generated using normal distributions with ‘known’ means and variances. The screenshot of python code below describes how this is generated:

For clarity, wn_calc is equivalent to y, and stiff_calc is equivalent to x.

With my hierarchical model (written in Stan), I am trying to relearn the ‘known’ parameters of the distributions from the data generation process, given the observations, and prior distributions over these ‘known’ parameters. The code for the Stan model is as below.

data { 
    int<lower=0> N; // number of data
    vector[N] wn; // variates
    array[N] int I; // index
    int<lower=0> K; // number of domains
    vector[2] c; // coefficients

    vector[2] sigma0; // population noise
    vector[2] mu_s0; // population mean
    vector<lower=0>[2] sig_s0; // population variance
} 

parameters {
    real<lower=0> sigma; //noise
    vector[K] s_k;
    real mu_s;
    real<lower=0> sig_s;
} 

model {
    // priors/hyperpriors
    mu_s ~ gamma(mu_s0[1],mu_s0[2]);
    sig_s ~ gamma(sig_s0[1],sig_s0[2]);
    sigma ~ gamma(sigma0[1],sigma0[2]);

    for (k in 1:K) {
        s_k[k] ~ normal(mu_s, sig_s);
    }

    vector[N] mu; 
    for (n in 1:N) {
        mu[n] = c[1]*s_k[I[n]] + c[2];
    }
    
    wn ~ normal(mu, sigma); // likelihood
}

A plot of the learned parameters is shown below, to which I have added the expected values calculated directly from the generated data with vertical lines.

There are a number of problems with this:

the variance of the population mean seems too small given the data
the mode of the population standard deviation is much too high compared to the expected value
the noise is also too high

I feel like there must be a problem with my understanding/how I have formulated the Stan model, although I seem to get the same result in PyMC. I have also tried z-transforming the data but get the same result, as well as generating the data via Stan - again the same result.

Any ideas/advice would be greatly appreciated!

martinmodrak · August 22, 2023, 3:23pm

Hi,
there seems to be a mismatch between the Python code and the Stan code - in the Python code, you draw different noise for each of the K components while in the Stan code you have only one sigma parameter. This could explain at least some of the differences.

Hope that helps at least a bit
Martin

Topic		Replies	Views
Posterior distributions in hierarchical model problems Modeling fitting-issues , specification , cmdstanpy , hierarchical-model	6	443	February 5, 2024
Having trouble recovering known parameters from simulated data Modeling cmdstanpy	2	648	June 4, 2021
CmdStanPy Variational - How to get density parameters? Modeling techniques , fitting-issues , variational-bayes , covariance , cmdstanpy	10	64	November 19, 2024
Bayesian inference in Stan Modeling	40	5805	July 18, 2017
Bayesian hierarchical modeling of a two-stage process model (Planning stages) Modeling techniques	2	421	September 26, 2023

Problems with posteriors of parameters in Bayesian hierarchical model (cmdstanpy)

Related topics