Posterior contraction smaller in data than posterior predictive

I ran my model on 100 simulated data vectors from the posterior predictive distribution to check for fitting biases. I didn’t see any biases, but noticed that the contraction (as defined here) of a specific parameter (out of 5) was a lot smaller for the real data than for the simulated (i.e. 3 out of the 100 samples had a smaller contraction). What could this mean? Is this something I should be concerned about?

Hi, sorry it took us long to get to you.

Generally those things are very hard to judge without context - what is the model? How do you simulate data? Where do the real data come from? Can you share the data?

It is however not uncommon that some datasets do not really inform some parameters of a model, so it is possible that this is the case. (a simple example would be if some predictor values are underrepresented in your real data compared to the simulated data).

Best of luck with your model!

No problem! Thanks for replying: model is here:

//Declaring all inputs to the model here 
data {
    int<lower=0> N; // number of SNe
    vector[N] muresiduals; // Data vector: SN mu's less distance modulus predicted with planck cosmology from redshift
    vector[N] mucorrections; // distance modulus with 2m++ correction - distance modulus without
    vector<lower=0>[N] muerr; // Photometric uncertainty in mu 
    vector[N] biasscale;  // biasScale_muCOV used to scale intrinsic scatter
    cov_matrix[N] pecvelcovmu; //Covariance matrix from pec vel in mag units
    cov_matrix[N] nonlinearmu; //Diagonal matrix for the nonlinear velocity dispersion
    matrix[N,N] systematics; //Fixed systematics (I've just passed zeros to this)
}
// Declare the parameters of the model, with appropriate bounds
parameters {
    real<lower=-.3,upper=.3> offset;
    real<lower=.01,upper=.3> intrins;
    
	    real<lower=10,upper=500> veldispersion_additional;

	    real<lower=.01,upper=10> velscaling;

	    real<lower=-1,upper=2> betarescale;

}
model {
    		//Define total covariance matrix
	    matrix[N,N] sigma = diag_matrix(biasscale*square(intrins)+square(muerr))+nonlinearmu*square(veldispersion_additional)+systematics+square(velscaling)*pecvelcovmu;
		//Muresiduals are gaussian distributed centered on offset with corrections, covariance sigma
	    muresiduals ~ multi_normal(betarescale*mucorrections+offset, sigma);
    		//Scale free prior on the covariance parameters: Square root of the determinant of the Fisher matrix of these parameters
	    real sigparams[3] = { intrins,veldispersion_additional,velscaling  };
	    matrix[N,N] invsigdesigns[3] = { mdivide_left_spd(sigma,diag_matrix(biasscale)),mdivide_left_spd(sigma,nonlinearmu),mdivide_left_spd(sigma,pecvelcovmu) };
	    matrix[3,3] fishermatrix;
	    for (i in 1:3){
	        for (j in 1:3){
		    fishermatrix[i,j]= sigparams[i]*sigparams[j] *  sum(invsigdesigns[i].*invsigdesigns[j]);
	        }
	    }
	    target+=(.5*log_determinant(fishermatrix));    
}

To try and provide a bit of context beyond what’s in those comments, the basic idea is that we have a data vector mu; this has already been through some processing, and so what I’m working with are residuals. There’s some additional uncertainty accounted for by intrins, and an effect from velocity (thus veldispersion and velscaling) with a bit more structure that I’ve already constructed a covariance matrix to model, then want to see how well we can constrain the overall scale of this effect. There’s also a set of corrections from an external data source intended to reduce the effect of correlations on the data set, which are then modulated by betarescale.

The parameter that’s showing less contraction than expected would be velscaling, so the one that’s controlling the scale of the covariance.

To simulate the data I just took samples from the posterior, constructed the covariance matrix, and sampled from that multivariate gaussian.

Also, to clarify, the posterior contraction is still very high in the real data, just lower than for the simulated! This scatter plot shows z-scores and variance in the parameters. The blue vertical lines show the variance of the parameter in the real data

semilogposteriorscatterplot (1)

First, I have to say the model is totally beyond my pay grade, I don’t really have any idea what is going on there. But:

That feels weird - usually data should be simulated by drawing from the prior distribution for all parameters and simulating based on those. I understand your model is not the standard kind I deal with, so maybe this doesn’t easily apply. But I wouldn’t be surprised if it is this simulation process that leads to larger contraction as you might be generating data that are “too nice” for the model. Also if there is significant contraction I wouldn’t worry that much whether it is larger or smaller than on simulated data.

A minor note about the model: We usually discourage using hard bound for priors, unless they correspond to some physical hard bound. If you just want to express your prior belief about plausible range of parameter values a soft bound usually results in faster sampling and less issues, e.g. if you believe offset should be between -0.3 and 0.3, you can have no hard constraint but instead put an offset ~ normal(0, 0.15) prior that would quite strongly push the parameter in this range. For velscaling it seems that lower=0 is a hard bound, but you can then use lognormal(log(5), 1.5) or gamma(3, 0.5) prior to constrain the upper tail. (the actual prior parameters are just my quick guess, you are obviously encouraged to find values that make sense for your application)

Does that make sense?