I am trying to sample from the prior distributions of a hierarchical model. I am able to do this for the hyperparameters but not for the parameters.
data{
int N;
int<lower = 1, upper=2> dm[N];
real y[N];
}
parameters{
real mu;
real eta_s2;
real<lower=0> tau_s2;
real<lower=0, upper=1> sigma2[2];
real<lower=0> ptau;
}
model{
int m;
real sig2;
for(i in 1:N){
m = dm[i];
sig2 = sigma2[m];
target += normal_lpdf(y[i]|mu, sig2);
}
mu ~ normal(0,1);
sigma2 ~ lognormal(eta_s2, tau_s2);
eta_s2 ~ normal(0, 2);
tau_s2 ~ cauchy(0,1);
ptau ~ cauchy(0,1);
}
//generated quantities {
//real eta_sim = normal_rng(0, 2);
//real<lower=0> tau_sim = cauchy_rng(0, 1);
//real<lower=0, upper=1> sigma_sim = lognormal_rng(eta_sim, tau_sim);
//}
The data I used in the model is
eta_s2 = c(-3.3648077, 0.1199825)
tau_s2 = c(0.1103267, 0.4475340)
sigma1 = 0.03231332
sigma2 = 0.7080471
mu = 0
y = c(-0.0193603047, 0.0195620265, 0.0353037451, 0.0155547999, -0.0458207621, -0.0180215304,
-0.0167241323, 0.0370329729, 0.0247856454, -0.0143030861, 0.0192524402, 0.0426936615,
0.0333482304, -0.0181248670, 0.0060638257, 0.0293641760, -0.0333053618, -0.0113788365,
-0.0196547950, 0.0195201185, -0.3131050855, -0.4545881193, 0.0311973717, -0.2925066598,
-0.4785626458, 0.3772071422, 0.4736467624, -0.5973105308, 0.6208043347, -0.0656066962,
-0.4639143741, 0.2282888338, -1.1815579061, -0.2702454827, -0.7529282319, -0.3903298828,
-0.5810873580, 0.2378588292, -0.1556719877, -0.0006158167)
dm = rep(c(1,2),each = 20)
stan_list = list(N=40, y=y, dm=dm)
Failure_time_model = stan("priors.stan", data = stan_list, iter =10000, seed = 1, chains = 1, warmup = 3000, cores = 1)
I have attached a small, reproducible example above. The parameter ptau is not included included in the likelihood and hence the “posterior” for ptau is simply the prior. I named the stan file “priors.stan”. The prior for ptau is shown below.
stan_prior.pdf (4.4 KB)
I also tried sampling from the priors in a generated quantities block. I had to comment this block out since I was receiving errors. The errors were because Stan was sampling some negative values of tau_sim, even though I have set a constraint.
A third option that I tried is to simply provide only parameters and model block with no log-likelihood function. For example,
parameters{
real eta_s2;
real<lower=0> tau_s2;
}
model{
eta_s2 ~ normal(0, 2);
tau_s2 ~ cauchy(0,1);
}
The above code returns samples from the prior distributions. However, as above, this trick does not work for priors with hyperparameters.
The model runs fine but I would like to see what the prior for sigma2 looks like.
After trying to extract prior samples from Stan I decided I could just simulate my own by doing what Stan is doing behind the scenes.
To sample from a truncated distribution, Stan uses the following method. If F(x) is the cdf of the unbounded variable, then the cdf for the bounded variable is given by \dfrac{F(x) - F(a)}{F(b) - F(a)}.
For example, to sample from a cauchy(0,1)
distribution that is bounded below at zero, one can do the following in R:
x = seq(0,400,length.out=1000)
y = pcauchy(x,0,1)
#z is the cdf of a Cauchy(0,1) distribution that is bounded below at zero
z = (y - pcauchy(0,0,1))/(1 - pcauchy(0,0,1))
z[1000]= 1
#Note I set z[1000] = 1 meaning Pr(X<=400) = 1. In reality Pr(X<=400) = 0.998 but the Cauchy distribution gives a small density to values of around 10^5 and so I have truncated the possible values for simplicity
tau = vector("numeric", length = 7000)
pp = runif(7000)
for(i in 1:7000){
tau[i] = approx(z,x,pp[i])$y
}
That is, z is the cdf of a Cauchy(0,1) distribution that is bounded below by zero. This method is used by Stan. The above code simply samples from the inverse CDF of a Cauchy(0,1) distribution that is bounded below by zero. Plotting a histogram of tau we see that this prior is the same as the prior obtained by Stan.
prior_r.pdf (4.3 KB)
Samples from the prior of eta_s2 are easy to obtain and are given by
eta_s2 = rnorm(7000,0,2)
Now, I need to obtain samples from the prior distribution of sigma2. In other words, I need to sample from a lognormal distribution that is bounded below by 0 and bounded above by 1 with parameters eta_s2 and tau. This can be done using the following R code:
x = seq(0,1,length.out = 1000)
sigma2 = vector("numeric", length = 7000)
for(i in 1:7000){
y = plnorm(x,eta_s2[i], tau[i])
#z is the cdf of a lognormal distribution that is truncated below by 0 and above by 1 with parameters eta_s2[i] and tau[i].
z = (y - plnorm(0,eta_s2[i], tau[i]))/(plnorm(1,eta_s2[i], tau[i]) - plnorm(0,eta_s2[i], tau[i]))
sigma2[i] = approx(z,x,pp[i])$y
}
I receive the following error
Error in approx(z, x, pp[i]) :
need at least two non-NA values to interpolate
This is not an error relating to the R code, but an error with some of the pairs (eta_s2[i], tau[i]). For example, the pair (eta_s2[59], tau[59]) = (4.530589,0.01509039). These parameters give a probability of zero of sampling sigma2 between 0 and 1. An empirical histogram (with 10000 samples) of \sigma_2 \sim\text{LogNormal}(4.530589,0.01509039) is shown below. From the histogram we can see that there is no density anywhere near (0,1).
sig2_unbounded.pdf (4.4 KB)
So my question is, how does Stan not come with a warning when I run “priors.stan”? I believe the code I used in R is how Stan truncates the bounded distributions (the samples from tau using Stan are the same as the samples from R using the inverse CDF method).
There are many pairs of (eta_s2, tau) that give no density between 0 and 1 and hence it is difficult to use the inverse CDF method to sample from the truncated lognormal distribution in R.