Posterior distributions in hierarchical model problems

Hi community,

I have created a hierarchical / multi-level model using cmdstanpy and Stan, but I am seeing some behaviours in my posterior chains which don’t seem quite right. In the attached plot, the E_\sigma and V$_\sigma$ learned parameters seem to remain close to my priors, and not moving towards their expected values. Even when I increase the inputted data, these posteriors still seem not to move closer to their expected values. Furthermore, I also expect structure 6 (shown in black in the plots) to take some ‘statistical strength’ from the population, and so the \mu_s I think should have some posterior density at the global expected value (E_\mu).

I sometimes get errors, depending on the random seed, or number of generated inputted data points. Errors tend to include low BFMI values, or random variable is 0 or inf but must be positive finite (for the Esig model parameter). I have set this parameter to be gamma distributed so it seems very unlikely to be zero.

Any tips on ideas to try would be greatly appreciated, I have tried a lot of tweaks with priors but don’t seem to have improved the situation much to great frustration. I appreciate the problem is somewhat complicated so any questions let me know.

I am in essence using the natural frequency values (wn) in the input data, alongside a polynomial model to infer distributions about soil spring stiffness (s) at local and global levels.

It is also worth noting there is some posterior correlation between the \gamma and E_\sigma parameters. Increases to either of these parameters both have similar effects on the natural frequency - which may result in some identifiability issues? I tried to define some tighter priors to try and resolve this, but this hasn’t worked too well so far.

input data


data { 
    int<lower=0> N; // number of data
    vector[N] wn; // variates
    array[N] int I; // index
    int<lower=0> K; // number of domains
    vector[6] c; // coefficients

    // priors
    vector[2] gamma0;
    vector[2] Emu0; 
    vector<lower=0>[2] Esig0; 
    vector[2] Vmu0; 
    vector<lower=0>[2] Vsig0; 
} 

parameters {
    real<lower=0> gamma; //noise
    vector[N] dsk;
    real<lower=0> Emu;
    real<lower=0> Esig;
    real<lower=0> Vmu;
    real<lower=0> Vsig;
    vector[K] Es;
    vector<lower=0>[K] Vs;
} 

transformed parameters {
    vector[N] s;
    for (n in 1:N) {
        s[n] = Vs[I[n]]*dsk[n] + Es[I[n]];
    }
}
model {

    Emu ~ normal(Emu0[1],Emu0[2]);
    Vmu ~ cauchy(Vmu0[1],Vmu0[2]);
    Esig ~ gamma(Esig0[1],Esig0[2]); // this param is causing some errors e.g. 0 or inf but must be + finite
    Vsig ~ gamma(Vsig0[1],Vsig0[2]); // one or two similar errors with this param too.
    gamma ~ cauchy(gamma0[1],gamma0[2]);

    for (k in 1:K) {
        Es[k] ~ normal(Emu,Vmu);
        Vs[k] ~ normal(Esig,Vsig);
    }

    vector[N] mu; 
    for (n in 1:N) {
        dsk[n] ~ std_normal();
        mu[n] = c[1]*pow(s[n],5) + c[2]*pow(s[n],4) + c[3]*pow(s[n],3) + c[4]*pow(s[n],2) + c[5]*pow(s[n],1) + c[6];
    }
        
    wn ~ normal(mu, gamma); // likelihood  
}
1 Like

Is this an error? You have both “mu” values (i.e. Emu, Vmu) parameterizing Es, and both “sig” values (i.e. Esig, Vsig) parameterizing Vs, but there are “E” values (e.g. Emu, Esig) and “V” values (e.g. Vmu, Vsig) parameterizing both the Es and Vs normals. If this is a mispecification, which I suspect it is, you would run into problems with your posterior, particularly vis-à-vis the use of the Esig gamma distribution as the central tendency parameter of the Vs normal.

This is a bit hard to follow without the data. In this case the observed measurements are shown on the y-axis (natural frequencies denoted by wn in the code) and the goal is to estimate the soil spring stiffness shown on the x-axis (denoted by mu in the code)? And the mus are not modeled directly but in terms of a known 5-th degree polynomial function. (Where do the polynomial coefficients c come from?)

If I read the posterior plot for \mu_s correctly, the group mean \mu_6 for structure 6 is pulled towards the population mean E_\mu which is estimated to be just above 2. That seems reasonable. I’d say the model has difficulty estimating the group variance \sigma_6 of structure 6 which is why the two black dots end up in the same location on the x-axis. When we compare with the other data points, that seems unexpected.

Hi Corey, thanks for the reply. It’s actually correct in my example, although I appreciate the notation isn’t as clear as it could be. I’ve attached some updated plots to this comment to make it clearer, including an image of the graphical model.


graphical model

I’ve also slightly updated the way I calculate the “expected” values of E_\sigma and V_\sigma. Essentially for the true s values, I get the variance for each group (k = 1:K), and I take the weighted mean and weight standard deviation of these. I weight these so the group with only two observations (group 6) does not have so much influence. I hope that make sense.

It’s also worth noting this dataset is synthetic - so I can change the number of groups/observations to help with debugging.

Hi desislava, your assumptions are correct. I supply the coefficients myself from a separate model - simplified to to a polynomial as it is otherwise to expensive to run with HMC sampling.

The black dots ending up in the same location is by chance - the data is synthetic and I generate it based on some “known” distributions, run them through the polynomial, add some noise and that gives you the wn values. I am essentially trying to relearn these “known” distributions on s, given only the observations wn and some priors placed over the the distributions for s.

I think the main problems at the moment are with E_\sigma and V_\sigma.

Any further thoughts much appreciated!

also updated stan model to match and for clarity:

data { 
    int<lower=0> N; // number of data
    vector[N] wn; // variates
    array[N] int I; // index
    int<lower=0> K; // number of domains
    vector[6] c; // coefficients

    // priors
    vector[2] gamma0;
    vector[2] Emu0; 
    vector<lower=0>[2] Esig0; 
    vector[2] Vmu0; 
    vector<lower=0>[2] Vsig0; 
} 

parameters {
    real<lower=0> gamma; //noise
    vector[N] dsk;
    real<lower=0> Emu;
    real<lower=0> Esig;
    real<lower=0> Vmu;
    real<lower=0> Vsig;
    vector[K] Es;
    vector<lower=0>[K] Vs;
} 

transformed parameters {
    vector[N] s;
    for (n in 1:N) {
        s[n] = Vs[I[n]]*dsk[n] + Es[I[n]];
    }
}
model {

    Emu ~ normal(Emu0[1],Emu0[2]);
    Vmu ~ cauchy(Vmu0[1],Vmu0[2]);
    Esig ~ gamma(Esig0[1],Esig0[2]); // this param is causing some errors e.g. 0 or inf but must be + finite
    Vsig ~ gamma(Vsig0[1],Vsig0[2]); // one or two similar errors with this param too.
    gamma ~ cauchy(gamma0[1],gamma0[2]);

    for (k in 1:K) {
        Es[k] ~ normal(Emu,Vmu);
        Vs[k] ~ normal(Esig,Vsig);
    }

    vector[N] Ewn; 
    for (n in 1:N) {
        dsk[n] ~ std_normal();
        Ewn[n] = c[1]*pow(s[n],5) + c[2]*pow(s[n],4) + c[3]*pow(s[n],3) + c[4]*pow(s[n],2) + c[5]*pow(s[n],1) + c[6];
    }
        
    wn ~ normal(Ewn, gamma); // likelihood  
}