Hi,
I am fairly new to Stan, and would like to know if it’s possible to require a lognormal prior to only return values within a specified range (i.e. specify a truncated lognormal distribution when that is not one of the available distributions listed in the Stan manual). I think I might have part of the answer to my question, but could use a little push. (The use case is also kind of interesting!) The full model background and details follow, but let me start with the meat of the question. I have some parameters that look like this:
parameters{
...
real<upper=2.2> alpha_8_1;
real<lower=2.2> alpha_8_4;
}
and in the model section I have:
model{
...
mu_alpha ~ normal(0,4); // weak but not uninformative
sigma_beta ~ cauchy(0,5); // weak but not uninformative
alpha_8_1 ~ lognormal(mu_alpha, sigma_alpha);
alpha_8_4 ~ lognormal(mu_alpha, sigma_alpha);
...
}
As you can see, in each iteration of the model, it’s completely possible for a draw from the prior for alpha_8_1, alpha_8_4, or both to lie outside the range allowed by the parameter’s constraint. Now if I remember this correctly and it applies to Hamiltonian Monte Carlo as well as ordinary MCMC, then in calculating the probably of a candidate jump getting accepted or rejected I’m multiplying by the prior probability, so that means that a particular jump might be more likely (100% likely?) to be rejected if the prior has a value completely outside the acceptable range for that variable. In other words, I would be seeing lots of autocorrelation between subsequent jumps within a chain, and potentially not just for parameter alpha_8_1 depending on the implementation (i.e. if parameters are accepted or rejected separately or in blocks). Here’s what I’m seeing (see the traceplot for alpha[14,1] below):
(Let’s ignore the within-chain label switching that’s causing the anomalous chain 4 for alpha[14,2] for the moment.) I suspect that I’m seeing autocorrelation in alpha[14,1] because of these issues with the priors.
I think that I could deal with the lower bound for alpha_8_4 by writing something like this:
alpha_8_4 ~ lognormal(mu_alpha, sigma_alpha) + 2.2;
Is that right?
But then how do I specify the prior for alpha_8_1 to avoid getting draws greater than the threshold, 2.2? I think I would need to set the prior for that parameter to a truncated lognormal distribution, but I didn’t see any specification for such a distribution in the Stan manual. I can manufacture one, though, right? Since a lognormal distribution with given location and scale parameters can be written in terms of a normal distribution, then if I knew the exact location and scale parameters for my desired truncated lognormal distribution, I think I could probably calculate the necessary upper bound for the truncated normal distribution that would
generate my desired truncated lognormal distribution, and then actually implement that as Stan code. R code follows, as I’m still used to translating this into Stan code:
X <- exp(mu + sigma*Z)
where X
is the truncated lognormal variable, mu
and sigma
are respectively the location and scale parameters for the truncated lognormal distribution, and Z
is a truncated normal variable that can be specified in Stan. But how in Stan would I compute the upper bound to use in specifying Z
when mu
and sigma
are themselves hyperparameters?
That’s my question. Here’s more background if any of that doesn’t make sense:
MODEL
I’m fitting a five-dimensional compensatory two-parameter logistic IRT model. Right now I’m generating true model parameters and making sure I can recover them, and I’ve successfully done so for 1-D, 2-D, and 3-D models, and for one 5-D model in which the ability parameters are uncorrelated. Here I’ve specified that they are correlated, and I use spherical priors for the theta parameters to specify the units as one type of identifying constraint. (The model is still subject to rotational invariance, so the parameters that I get back from the model won’t match the “true” parameters and even if everything is correct, I will have to rotate the coordinate axes to recover the true parameters as per Reckase (2009).) In lower dimensions, I observe label switching between chains:
Here’s an example from a three-dimensional 2PL model where three distinct alpha parameters, one for each dimension, can be observed. It’s straightforward (albeit slow) to solve this problem by “relabeling” the chains, i.e. copying observations around in the resulting Stan model object. In five dimensions, the posterior distributions of the alpha parameters appear to be so highly multimodal that I start seeing label switching within chains as well as between them:
The true values of alpha[15,1]
, alpha[15,2]
, alpha[15,3]
, alpha[15,4]
and alpha[15,5]
are 1.312
,1.367
, 0.618
,2.653
,and 0.671
respectively, and the parameter values in the example above are close to but not exactly these values due to rotational invariance. In this case, alpha[15,4]
has a posterior mean of about 3.0, but different chains capture this parameter during different iterations of the model. To deal with the label switching within chains, my current strategy is to inspect plots like this one, and look for chains in which there appears to be a clear break between draws for a parameter associated with one specific dimension of the difficulty parameters for a particular item and those associated with all the other dimensions for that item. In other words, I’m looking for a horizontal “line” containing as much white as possible that can separate the various chains. In this case I might assign a constraint to alpha[15,4]
to force it to always be above about 2.5, and similarly constrain alpha[15,1]
, alpha[15,2]
, alpha[15,3]
, and alpha[15,5]
to be less than 2.5. My current full model, badly written to roll something out quickly, is as follows:
// Useful resource: https://rfarouni.github.io/assets/projects/BayesianMIRT/BayesianMIRT.html
data{
int<lower=1> n_respondents;
int<lower=1> n_items;
int<lower=0,upper=1> Y[n_respondents,n_items];
int<lower=1> D; // number of dimensions: required to be 5 in this case
}
transformed data{
cov_matrix[D] Sigma;
// From p. 335 of the manual: indices for unconstrained alpha parameters. Badly written code here,
// just want to write something that works and then improve.
int<lower=1, upper=38> idxs[165,2] = { {1, 1}, {1, 2}, {1, 3}, {1, 4}, {1, 5},
{2, 1}, {2, 2}, {2, 3}, {2, 4}, {2, 5},
{3, 1}, {3, 2}, {3, 3}, {3, 4}, {3, 5},
{5, 1}, {5, 2}, {5, 3}, {5, 4}, {5, 5},
{6, 1}, {6, 2}, {6, 3}, {6, 4}, {6, 5},
{7, 1}, {7, 2}, {7, 3}, {7, 4}, {7, 5},
{9, 1}, {9, 2}, {9, 3}, {9, 4}, {9, 5},
{11, 1}, {11, 2}, {11, 3}, {11, 4}, {11, 5},
{12, 1}, {12, 2}, {12, 3}, {12, 4}, {12, 5},
{13, 1}, {13, 2}, {13, 3}, {13, 4}, {13, 5},
{14, 1}, {14, 2}, {14, 3}, {14, 4}, {14, 5},
{15, 1}, {15, 2}, {15, 3}, {15, 4}, {15, 5},
{16, 1}, {16, 2}, {16, 3}, {16, 4}, {16, 5},
{17, 1}, {17, 2}, {17, 3}, {17, 4}, {17, 5},
{18, 1}, {18, 2}, {18, 3}, {18, 4}, {18, 5},
{19, 1}, {19, 2}, {19, 3}, {19, 4}, {19, 5},
{21, 1}, {21, 2}, {21, 3}, {21, 4}, {21, 5},
{22, 1}, {22, 2}, {22, 3}, {22, 4}, {22, 5},
{23, 1}, {23, 2}, {23, 3}, {23, 4}, {23, 5},
{24, 1}, {24, 2}, {24, 3}, {24, 4}, {24, 5},
{25, 1}, {25, 2}, {25, 3}, {25, 4}, {25, 5},
{26, 1}, {26, 2}, {26, 3}, {26, 4}, {26, 5},
{27, 1}, {27, 2}, {27, 3}, {27, 4}, {27, 5},
{28, 1}, {28, 2}, {28, 3}, {28, 4}, {28, 5},
{29, 1}, {29, 2}, {29, 3}, {29, 4}, {29, 5},
{30, 1}, {30, 2}, {30, 3}, {30, 4}, {30, 5},
{31, 1}, {31, 2}, {31, 3}, {31, 4}, {31, 5},
{32, 1}, {32, 2}, {32, 3}, {32, 4}, {32, 5},
{33, 1}, {33, 2}, {33, 3}, {33, 4}, {33, 5},
{34, 1}, {34, 2}, {34, 3}, {34, 4}, {34, 5},
{35, 1}, {35, 2}, {35, 3}, {35, 4}, {35, 5},
{36, 1}, {36, 2}, {36, 3}, {36, 4}, {36, 5},
{37, 1}, {37, 2}, {37, 3}, {37, 4}, {37, 5} };
row_vector [D] mu_theta = rep_row_vector(0.0, D);
Sigma = rep_matrix(0.0, D, D);
for(i in 1:D){
Sigma[i,i] = 1.0;
} // Identity matrix as per Reckase (2009) and Chun Wang (2015)
}
parameters{
vector [D] theta [n_respondents];
vector<lower=0>[n_items*D-25] alpha_raw;
real<lower=2.0> alpha_8_4;
real<upper=2.2> alpha_8_1;
real<upper=2.2> alpha_8_2;
real<upper=2.2> alpha_8_3;
real<upper=2.2> alpha_8_5;
real<lower=2.15> alpha_10_1;
real<upper=2.15> alpha_10_2;
real<upper=2.15> alpha_10_3;
real<upper=2.15> alpha_10_4;
real<upper=2.15> alpha_10_5;
real<lower=1.7> alpha_38_3;
real<upper=1.7> alpha_38_1;
real<upper=1.7> alpha_38_2;
real<upper=1.7> alpha_38_4;
real<upper=1.7> alpha_38_5;
real<lower=1.6> alpha_4_5;
real<upper=1.6> alpha_4_1;
real<upper=1.6> alpha_4_2;
real<upper=1.6> alpha_4_3;
real<upper=1.6> alpha_4_4;
real<lower=1.4> alpha_20_2;
real<upper=1.4> alpha_20_1;
real<upper=1.4> alpha_20_3;
real<upper=1.4> alpha_20_4;
real<upper=1.4> alpha_20_5;
real mu_alpha;
// mu_alpha doesn't have to be lower-bounded at zero because only
// exp(mu_alpha) has to be positive.
vector[n_items] beta;
real<lower=0> sigma_alpha;
real<lower=0> sigma_beta;
real mu_beta;
}
transformed parameters {
matrix[n_items, D] alpha;
for(i in 1:165){
alpha[idxs[i,1], idxs[i,2]] = alpha_raw[i];
}
alpha[4,1] = alpha_4_1;
alpha[4,2] = alpha_4_2;
alpha[4,3] = alpha_4_3;
alpha[4,4] = alpha_4_4;
alpha[4,5] = alpha_4_5;
alpha[8,1] = alpha_8_1;
alpha[8,2] = alpha_8_2;
alpha[8,3] = alpha_8_3;
alpha[8,4] = alpha_8_4;
alpha[8,5] = alpha_8_5;
alpha[10,1] = alpha_10_1;
alpha[10,2] = alpha_10_2;
alpha[10,3] = alpha_10_3;
alpha[10,4] = alpha_10_4;
alpha[10,5] = alpha_10_5;
alpha[20,1] = alpha_20_1;
alpha[20,2] = alpha_20_2;
alpha[20,3] = alpha_20_3;
alpha[20,4] = alpha_20_4;
alpha[20,5] = alpha_20_5;
alpha[38,1] = alpha_38_1;
alpha[38,2] = alpha_38_2;
alpha[38,3] = alpha_38_3;
alpha[38,4] = alpha_38_4;
alpha[38,5] = alpha_38_5;
}
model{
// hyperpriors
sigma_alpha ~ cauchy(0,5);
mu_alpha ~ normal(0,4); // weak but not uninformative
sigma_beta ~ cauchy(0,5); // weak but not uninformative
mu_beta ~ normal(0.0, 4); // weak but not uninformative
// priors
for(i in 1:n_respondents){
theta[i] ~ multi_normal(mu_theta, Sigma); // Reckase 2009
}
beta ~ normal(mu_beta,sigma_beta);
alpha_raw ~ lognormal(mu_alpha, sigma_alpha); // to_vector(alpha_raw)?
// Stan forum question: How do I do something like a truncated lognormal here?
alpha_4_1 ~ lognormal(mu_alpha, sigma_alpha);
alpha_4_2 ~ lognormal(mu_alpha, sigma_alpha);
alpha_4_3 ~ lognormal(mu_alpha, sigma_alpha);
alpha_4_4 ~ lognormal(mu_alpha, sigma_alpha);
alpha_4_5 ~ lognormal(mu_alpha, sigma_alpha);
alpha_8_1 ~ lognormal(mu_alpha, sigma_alpha);
alpha_8_2 ~ lognormal(mu_alpha, sigma_alpha);
alpha_8_3 ~ lognormal(mu_alpha, sigma_alpha);
alpha_8_4 ~ lognormal(mu_alpha, sigma_alpha);
alpha_8_5 ~ lognormal(mu_alpha, sigma_alpha);
alpha_10_1 ~ lognormal(mu_alpha, sigma_alpha);
alpha_10_2 ~ lognormal(mu_alpha, sigma_alpha);
alpha_10_3 ~ lognormal(mu_alpha, sigma_alpha);
alpha_10_4 ~ lognormal(mu_alpha, sigma_alpha);
alpha_10_5 ~ lognormal(mu_alpha, sigma_alpha);
alpha_20_1 ~ lognormal(mu_alpha, sigma_alpha);
alpha_20_2 ~ lognormal(mu_alpha, sigma_alpha);
alpha_20_3 ~ lognormal(mu_alpha, sigma_alpha);
alpha_20_4 ~ lognormal(mu_alpha, sigma_alpha);
alpha_20_5 ~ lognormal(mu_alpha, sigma_alpha);
alpha_38_1 ~ lognormal(mu_alpha, sigma_alpha);
alpha_38_2 ~ lognormal(mu_alpha, sigma_alpha);
alpha_38_3 ~ lognormal(mu_alpha, sigma_alpha);
alpha_38_4 ~ lognormal(mu_alpha, sigma_alpha);
alpha_38_5 ~ lognormal(mu_alpha, sigma_alpha);
// likelihood
// More efficient to write loops column by column as matrices are stored internally in column-major order (Stan manual p. 327)
for(j in 1:n_items){
for(i in 1:n_respondents){
real p;
p = inv_logit(row(alpha,j)*theta[i]+beta[j]);
Y[i,j] ~ bernoulli(p);
}
}
}
generated quantities{
vector[n_items] log_lik[n_respondents];
vector[n_items] pY[n_respondents];
for(j in 1:n_items){
for(i in 1:n_respondents){
real p;
pY[i,j] = inv_logit(row(alpha,j)*theta[i]+beta[j]);
log_lik[i,j] = bernoulli_lpmf(Y[i,j] | pY[i,j]);
}
}
}
Note that to get a model like this one to work, I need to force the initial values to satisfy the constraints. Here’s some R code for that:
# For initial value generation.
generate.alpha.with.upper.bound <- function(meanlog, sdlog, upper){
result <- Inf
while(result > upper){
result <- rlnorm(1, meanlog=meanlog, sdlog=sdlog)
}
return(result)
}
# For initial value generation.
generate.alpha.with.lower.bound <- function(meanlog, sdlog, lower){
result <- -Inf
while(result < lower){
result <- rlnorm(1, meanlog=meanlog, sdlog=sdlog)
}
return(result)
}
set.seed(12345)
inits <- list()
for(i in 1:4){
chain.inits <- list()
chain.inits$alpha_4_1 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.6)
chain.inits$alpha_4_2 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.6)
chain.inits$alpha_4_3 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.6)
chain.inits$alpha_4_4 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.6)
chain.inits$alpha_4_5 <- generate.alpha.with.lower.bound(meanlog=0, sdlog=1, lower=1.6)
chain.inits$alpha_8_1 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.2)
chain.inits$alpha_8_2 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.2)
chain.inits$alpha_8_3 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.2)
chain.inits$alpha_8_4 <- generate.alpha.with.lower.bound(meanlog=0, sdlog=1, lower=2.0)
chain.inits$alpha_8_5 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.2)
chain.inits$alpha_10_1 <- generate.alpha.with.lower.bound(meanlog=0, sdlog=1, lower=2.15)
chain.inits$alpha_10_2 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.15)
chain.inits$alpha_10_3 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.15)
chain.inits$alpha_10_4 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.15)
chain.inits$alpha_10_5 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=2.15)
chain.inits$alpha_38_1 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.7)
chain.inits$alpha_38_2 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.7)
chain.inits$alpha_38_3 <- generate.alpha.with.lower.bound(meanlog=0, sdlog=1, lower=1.7)
chain.inits$alpha_38_4 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.7)
chain.inits$alpha_38_5 <- generate.alpha.with.upper.bound(meanlog=0, sdlog=1, upper=1.7)
inits[[i]] <- chain.inits
}
Here are the simulated data I’m working with:
5D_IRT_data.csv (105.8 KB)
(If you’d like to critique anything else of course, please go ahead since I’m new and that might help someone else here as well.) Thanks very much for any insight you could provide!
Best,
Richard