Sampling from prior distributions of a hierarchical model

I am trying to sample from the prior distributions of a hierarchical model. I am able to do this for the hyperparameters but not for the parameters.

data{
  int N;
  int<lower = 1, upper=2> dm[N];
  real y[N];
}
parameters{
  real mu;
  real eta_s2;  
  real<lower=0> tau_s2;
  real<lower=0, upper=1> sigma2[2];
  real<lower=0> ptau;
}
model{
  int m;
  real sig2;
  for(i in 1:N){
    m = dm[i];
    sig2 = sigma2[m];
    target += normal_lpdf(y[i]|mu, sig2);
  }
  mu ~ normal(0,1);
  sigma2 ~ lognormal(eta_s2, tau_s2);
  eta_s2 ~ normal(0, 2);
  tau_s2 ~ cauchy(0,1);
  ptau ~ cauchy(0,1);
}
//generated quantities {
  //real eta_sim = normal_rng(0, 2);  
  //real<lower=0> tau_sim = cauchy_rng(0, 1);
  //real<lower=0, upper=1> sigma_sim = lognormal_rng(eta_sim, tau_sim);
//}

The data I used in the model is

eta_s2 = c(-3.3648077, 0.1199825)
tau_s2 = c(0.1103267, 0.4475340)
sigma1 = 0.03231332
sigma2 = 0.7080471
mu = 0
y = c(-0.0193603047,  0.0195620265,  0.0353037451,  0.0155547999, -0.0458207621, -0.0180215304,
      -0.0167241323,  0.0370329729,  0.0247856454, -0.0143030861,  0.0192524402,  0.0426936615,
       0.0333482304, -0.0181248670,  0.0060638257,  0.0293641760, -0.0333053618, -0.0113788365,
      -0.0196547950,  0.0195201185, -0.3131050855, -0.4545881193,  0.0311973717, -0.2925066598,
      -0.4785626458,  0.3772071422,  0.4736467624, -0.5973105308,  0.6208043347, -0.0656066962,
      -0.4639143741,  0.2282888338, -1.1815579061, -0.2702454827, -0.7529282319, -0.3903298828,
      -0.5810873580,  0.2378588292, -0.1556719877, -0.0006158167)
dm = rep(c(1,2),each = 20)
stan_list = list(N=40, y=y, dm=dm)
Failure_time_model = stan("priors.stan", data = stan_list, iter =10000, seed = 1, chains = 1, warmup = 3000, cores = 1)

I have attached a small, reproducible example above. The parameter ptau is not included included in the likelihood and hence the “posterior” for ptau is simply the prior. I named the stan file “priors.stan”. The prior for ptau is shown below.

stan_prior.pdf (4.4 KB)

I also tried sampling from the priors in a generated quantities block. I had to comment this block out since I was receiving errors. The errors were because Stan was sampling some negative values of tau_sim, even though I have set a constraint.

A third option that I tried is to simply provide only parameters and model block with no log-likelihood function. For example,

parameters{
  real eta_s2;  
  real<lower=0> tau_s2;
}
model{
  eta_s2 ~ normal(0, 2);
  tau_s2 ~ cauchy(0,1);
}

The above code returns samples from the prior distributions. However, as above, this trick does not work for priors with hyperparameters.

The model runs fine but I would like to see what the prior for sigma2 looks like.

After trying to extract prior samples from Stan I decided I could just simulate my own by doing what Stan is doing behind the scenes.

To sample from a truncated distribution, Stan uses the following method. If F(x) is the cdf of the unbounded variable, then the cdf for the bounded variable is given by \dfrac{F(x) - F(a)}{F(b) - F(a)}.

For example, to sample from a cauchy(0,1) distribution that is bounded below at zero, one can do the following in R:

x = seq(0,400,length.out=1000)
y = pcauchy(x,0,1)
#z is the cdf of a Cauchy(0,1) distribution that is bounded below at zero 
z = (y - pcauchy(0,0,1))/(1 - pcauchy(0,0,1))
z[1000]= 1
#Note I set z[1000] = 1 meaning Pr(X<=400) = 1. In reality Pr(X<=400) = 0.998 but the Cauchy distribution gives a small density to values of around 10^5 and so I have truncated the possible values for simplicity

tau = vector("numeric", length = 7000)
pp = runif(7000)

for(i in 1:7000){
  tau[i] = approx(z,x,pp[i])$y
}

That is, z is the cdf of a Cauchy(0,1) distribution that is bounded below by zero. This method is used by Stan. The above code simply samples from the inverse CDF of a Cauchy(0,1) distribution that is bounded below by zero. Plotting a histogram of tau we see that this prior is the same as the prior obtained by Stan.

prior_r.pdf (4.3 KB)

Samples from the prior of eta_s2 are easy to obtain and are given by

eta_s2 = rnorm(7000,0,2)

Now, I need to obtain samples from the prior distribution of sigma2. In other words, I need to sample from a lognormal distribution that is bounded below by 0 and bounded above by 1 with parameters eta_s2 and tau. This can be done using the following R code:

x = seq(0,1,length.out = 1000)

sigma2 = vector("numeric", length = 7000)
for(i in 1:7000){
  
  y = plnorm(x,eta_s2[i], tau[i])
#z is the cdf of a lognormal distribution that is truncated below by 0 and above by 1 with parameters eta_s2[i] and tau[i]. 
  z = (y - plnorm(0,eta_s2[i], tau[i]))/(plnorm(1,eta_s2[i], tau[i]) - plnorm(0,eta_s2[i], tau[i]))
  
  sigma2[i] = approx(z,x,pp[i])$y
  
}

I receive the following error

Error in approx(z, x, pp[i]) : 
  need at least two non-NA values to interpolate

This is not an error relating to the R code, but an error with some of the pairs (eta_s2[i], tau[i]). For example, the pair (eta_s2[59], tau[59]) = (4.530589,0.01509039). These parameters give a probability of zero of sampling sigma2 between 0 and 1. An empirical histogram (with 10000 samples) of \sigma_2 \sim\text{LogNormal}(4.530589,0.01509039) is shown below. From the histogram we can see that there is no density anywhere near (0,1).

sig2_unbounded.pdf (4.4 KB)

So my question is, how does Stan not come with a warning when I run “priors.stan”? I believe the code I used in R is how Stan truncates the bounded distributions (the samples from tau using Stan are the same as the samples from R using the inverse CDF method).

There are many pairs of (eta_s2, tau) that give no density between 0 and 1 and hence it is difficult to use the inverse CDF method to sample from the truncated lognormal distribution in R.

1 Like

I think that prior sampling must happen in the generated_quantities block, I am not 100% about this, but this is what I understood at first and have been using that ever since (i.e. in the code for my poster to probprog).

In you case, to avoid getting negative values, I think you’ll also need to take that into account in the sample generation process. This two pages should guide you on this particular topic:
sampling statements (covers truncated variables) and truncated number generation user guide

You are seeing weird behavior related to your non-generative prior, and also because prior sampling in hierarchical models generally requires a non-centered parameterization.

When I say you have a non-generative prior, I mean that the lines of code that look like your prior do not intrinsically respect the constraint that you’ve placed on sigma2. Therefore, your real prior is not given by

  mu ~ normal(0,1);
  sigma2 ~ lognormal(eta_s2, tau_s2);
  eta_s2 ~ normal(0, 2);
  tau_s2 ~ cauchy(0,1);  // really a half-cauchy; this aspect of the nongenerative prior is easy to grasp
  ptau ~ cauchy(0,1);  // ditto

Instead, it is some weird part of this space where all parts of this prior with sigma2[1] > 1 or sigma2[2] > 1 are removed. Notice that with these chunks of the joint parameter space removed, it will not necessarily be the case that the marginal prior distributions for any of your parameters correspond to the sampling statements that you used to build the prior.

If we remove the upper bound constraint on sigma2, then we have a more easily intelligible generative prior. We can sample from this prior with the stan program

parameters{
  real mu;
  real eta_s2;
  real<lower=0> tau_s2;
  real<lower=0> sigma2[2];
  real<lower=0> ptau;
}
model{
  mu ~ normal(0,1);
  sigma2 ~ lognormal(eta_s2, tau_s2);
  eta_s2 ~ normal(0, 2);
  tau_s2 ~ cauchy(0,1);
  ptau ~ cauchy(0,1);
}

However, this model produces a lot of divergences. We can switch to a non-centered parameterization of the same model, but we still see a couple of divergences. We can squash these if domain knowledge is consistent with marginal prior distributions that are slightly less heavy-tailed than Cauchy, yielding the prior model expressed in the following Stan program (non-centered parameterization), which samples fine:

parameters{
  real mu;
  real eta_s2;
  real<lower=0> tau_s2;
  real<offset = eta_s2, multiplier = tau_s2> log_sigma2[2];
  real<lower=0> ptau;
}
transformed parameters{
  real sigma2[2] = exp(log_sigma2);
}
model{
  mu ~ normal(0,1);
  sigma2 ~ lognormal(eta_s2, tau_s2);
  eta_s2 ~ normal(0, 2);
  tau_s2 ~ student_t(2, 0, 1);
  ptau ~ student_t(2, 0, 1);
  target += sum(log_sigma2);  //this is the Jacobian adjustment
}

So now let’s come back to your original model, including the constraint. When you try to simulate from this prior in R, you are drawing eta_s2 and tau_s2 from their apparent prior, which is different from the actual prior that you have implicitly placed on the joint distribution of eta_s2 and tau_s2 by implementing the upper-bound constraint on sigma2. When you simulate from the prior in Stan, Stan will try a bunch of inits until it finds a set of valid ones, and then will begin exploring the actual joint prior distribution that is implied by the weird interactions between your constraints and your sampling statements. Thus, it will never visit the regions of parameter space where eta_s2 and tau_s2 take values that make it nearly impossible to respect the upper-bound constraint on sigma2.

1 Like

@OriolAbril another way to sample from the prior is just to omit the likelihood from the model. Thus,

parameters{
  real x;
}
model{
  x ~ std_normal();
}

is a perfectly valid Stan model that declares a parameter x, places a standard normal prior on x, and samples it.

3 Likes