Ineffective sampling in linear regression when errors approach zero

Hi Martin, thanks so much for your reply! Thanks for your advice about summing on the variance scale and adjusting the error distributions. I think you are probably right that the distributions for the proportional errors should likely be lognormal or something similar.

I made a stripped-down version of the model with only one normal error contribution, and no variable scales on the coefficients:

data {
    int<lower=0> N;
    int<lower=0> K;
    matrix[N, K] x;
    vector[N] y;
    vector[N/2] freq;
    int<lower=0> N_tilde;
    matrix[N_tilde,K] x_tilde;
    vector[N_tilde/2] freq_tilde;
}
transformed data {
    vector [N] hfr_vec = append_row(rep_vector(1,N/2), rep_vector(0,N/2));
    vector [N] induc_vec = append_row(rep_vector(0,N/2), 2*pi()*freq);
    vector [N_tilde] hfr_vec_tilde = append_row(rep_vector(1,N_tilde/2), rep_vector(0,N_tilde/2));
    vector [N_tilde] induc_vec_tilde = append_row(rep_vector(0,N_tilde/2), 2*pi()*freq_tilde);
}
parameters {
    real<lower=0> hfr;
    real<lower=0> induc;
    vector<lower=0>[K] beta;
    real<lower=0> sigma;
}
model {
    hfr ~ normal(0,1000);
    induc ~ std_normal();
    beta ~ normal(0,5);
    y ~ normal(x*beta + hfr*hfr_vec + induc*induc_vec, sigma);
    sigma ~ std_normal();
}
generated quantities {
    vector[N_tilde] y_tilde
        = x_tilde*beta + hfr*hfr_vec_tilde + induc*induc_vec_tilde;
}

I still run into the same problem when I run this model on noiseless simulated data. Below is a plot of the posterior mean coefficient values compared to the true coefficient values:

image

While the posterior mean coefficient values are generally distributed around the true values, they are often far from the true values. The trace plot shows that all four chains (correctly) determined that sigma is essentially zero:

If I look at a pairplot of any two coefficients that are close to each other, I can see that each chain gets stuck in its own small local neighborhood. The chains do not mix:

image

However, if I simulate data with some artificial noise, the sampling is much more effective, the chains mix, and the posterior mean coefficient values are much closer to the true values. I presume that this is because, when sigma is very small, there’s too large of an energy barrier for the sampler to escape local minima. I wonder if part of the problem is that neighboring covariates in X tend to be highly correlated.