Prior predictive checks by removing the likelihood gives single values

I’m trying to perform a prior predictive check by simply taking my Stan model and commenting out the likelihood but somehow I only get one value for each parameter that remains the same throughout the iterations while I would like to explore the full specified prior distribution.

Here is a sample from the Stan code that allows to reproduce the errors :

data {
  int<lower=0> N;               //N observations total
  int<lower=1> P;               //N population-level effects  (intercept + beta)
  int<lower=0> J;               //N subj
  int<lower=1> n_u;             //N subj ranefs (ex. x * x | part = 4) if random effect on all effects, same argument as P can be passed
  int<lower=1,upper=J> subj[N]; //subj ID vector
  matrix[N,P] X;                //fixef design matrix
  matrix[N, n_u] Z_u;           //subj ranef design matrix, X if raneff = fixeff
  vector[N] y;                  // DV
  vector[2] p_intercept;        // prior on intercept
  vector[2] p_sd;               // prior on residual std
  vector[P-1] p_fmu;            // priors on fixef mu
  vector[P-1] p_fsigma;         // priors on fixef sigma
  vector[2] p_r;                // priors on ranef

parameters {
  real alpha;                   // intercept
  real<lower=0> sigma;          //residual std
  vector[P-1] beta;             //population-level effects coefs (w\o intercept)
  cholesky_factor_corr[n_u] L_u;//cholesky factor of subj ranef corr matrix
  vector<lower=0>[n_u] sigma_u; //subj ranef std INCLUDING INTERCEPT, hence beta[1] ~ sigma[2]
  vector[n_u] z_u[J];           //spherical subj ranef

model {
  //vector[N] mu;
  L_u ~ lkj_corr_cholesky(2.0);
  alpha ~ normal(p_intercept[1], p_intercept[2]);
  beta ~ normal(p_fmu, p_fsigma);
  sigma ~ normal(p_sd[1], p_sd[2]);
  for (j in 1:J)
    z_u[j] ~ normal(p_r[1],p_r[2]);

  //for (n in 1:N)
  //  mu[n] = alpha + X_beta[n] * beta + Z_u[n] * u[subj[n]];
  //y ~ normal(mu, sigma);

I then sample from it by using the pystan module and passing the real data :

fixeff_form = "1+Factor1+Factor2+Factor1:Factor2"#Fixed effects formula
raneff_form = fixeff_form #Random effects formula
fixeff = np.asarray(patsy.dmatrix(fixeff_form, LMEdata)) #FE design matrix
raneff = np.asarray(patsy.dmatrix(raneff_form, LMEdata)) #RE design matrix
prior_intercept = np.asarray([6.15,.3])#prior for intercept: mu and sigma
priors_mu = np.repeat(0, 8) #Priors on mu for FE
priors_sigma =  np.repeat(.4, 8) # priors on sigma for FE
priors_raneff = [0, .4] #Priors on RE
prior_sd = [0, .4] #priors on residual sigma

LME_data = dict(
    N = len(LMEdata),
    P = fixeff.shape[-1], #number of pop level effects
    J = len(LMEdata.participant.unique()),
    n_u = raneff.shape[-1],
    subj = LMEdata.participant,
    X = fixeff,
    Z_u = raneff,
    y = LMEdata.obs.get_values(),
    p_intercept = prior_intercept, p_sd = prior_sd, p_fmu = priors_mu, p_fsigma = priors_sigma, p_r = priors_raneff

fit = priors_LME.sampling(data=LME_data, iter=1000, chains=1,  warmup =0)

Behavior is the same whether I use the Fixed_param algorithm or not, when I include some warm-up I get some estimation but the distribution of these estimations is far more narrower than the given priors.

P.-S I know that this method is sub-optimal but using the generated quantities makes it complicated as there are also transformed parameters and other generated quantities in the full Stan model

1 Like

you definitely need to include warmup - to Stan your “prior-only” model is just another distribution to sample from and it needs warmup. Are all the diagnostitcs (n_eff, Rhat, divergences, treedepth) good for the “prior only” fit?

people often do this with a an int<lower=0,upper=1> prior_only in data and then having if(prior_only) around the likelihood part (you can also have optional params depend on prior_only, e.g. mu, see my blog post on the topic ). This way you don’t need to recompile for prior checks and just pass prior_only = 1 to the model.

Hope that helps!

1 Like

Hi Martin,

Thanks for your answer, I just tried to include warmups (up to 100000 with 5000 additional iterations) but the results are still off, see for example the trace for one of the parameters (which does not behave like this when I include the likelihood).

RT_fit = priors_LME.sampling(data=RT_LME_data, iter=105000, chains=1, 
                      warmup =100000)

Obviously divergences (in black in the plot), n_eff and Rhat are off the charts

Edit : However the distribution/trace of the MCMC chain does explore my prior (normal, mu=6.15, sd=0.4) but in a really weird way…

This may or may not be related to the issue at hand.

  1. In the code in the original post, did you delete the transformed parameters block when posting it here in the forum? It feels like some bits and pieces are missing from the code.
  2. sigma_u has no explicit prior. This may cause trouble with sampling and affect the other parameters as well.

Indeed, I initially tried to ease the reading of the code, anyway, adding a prior on sigma_u did fix the behavior !

Thanks a lot :)

1 Like

Just wondering if the link above is broken and should be this instead?

1 Like