Hello there.
Disclaimer: I am using the package rstan together with Stan.
I am currently working on the radon dataset of Gelman to build HLMs but I am new to Stan and am a bit confused about the syntax. I would be happy if somebody could help me out in my mistakes/if the intuition of my code is right.
The dataset contains household data of radon and an indication of where the measurement was taken (1 = floor, 0 = basement).
Furthermore, there are uranium measurements but only 1 per county (85 counties).
Let’s call the variables:
Y = radon
X = floor
U = uranium
N = number of observations (919)
J = number of counties (85)
I have three models:
- Unpooled: (Varying Intercepts, Fixed Slopes)
J different models based only on the observations of each county.
Y_j = alpha_j + beta_j * X
I want to put a prior on mu_alpha, sigma_alpha, mu_beta, sigma_beta and sigma_y.
For this, I have the following Stan script:
data {
int<lower=1> N; // number of obs
int<lower=1> J; // Number of counties
vector[N] Y; // outcome data (log radon)
vector[N] X; // Inputa data (floor)
int<lower=1> county[N]; //county indices
}
parameters {
vector[J] alpha;
vector[J] beta;
real<lower=0> sigma_y;
}
model {
vector[N] mu_y;
for (i in 1:N)
mu_y[i] = beta[county[i]] * X[i] + alpha[county[i]];
// Priors
Y ~ normal(mu_y, sigma_y);
alpha ~ normal(0,10);
beta ~ normal(0,10);
sigma_y ~ normal(0,10);
}
But, I get the following error message:
Warning messages:
1: In readLines(file, warn = TRUE) :
incomplete final line found on 'C:\Users\..\unpool_1.stan'
Q1: Is this message neglectable? + Is this the right specification?
- Hierarchical Linear Model with 1 Level (Varying Intercepts and Fixed Slopes):
J different models but now with information sharing, i.e. modelling the J different alphas as coming from the same distribution.
Y_j = alpha_j + beta * X
This is the Stan script I have for this model:
// HLM 1
data {
int<lower=1> N; // number of obs
int<lower=1> J; // Number of counties
vector[N] Y; // outcome data (log radon)
vector[N] X; // Inputa data (floor)
int<lower=1> county[N]; //county ID variable
}
parameters {
vector[J] alpha; // county j intercept
real beta;
real mu_alpha; // prior on alpha
real <lower=0, upper=100> sigma_alpha; // hyperparameter sigma of alpha
real<lower=0, upper=100> sigma_y;// prior sigma of y in county j
}
model { // Model specifications
vector[N] mu_y;
// Full Model
for(i in 1:N)
mu_y[i] = alpha[county[i]] + beta * X[i];
// Priors and Hyperparams
Y ~ normal(mu_y, sigma_y);
sigma_y ~ cauchy (0, 2.5);
alpha ~ normal(mu_alpha, sigma_alpha);
mu_alpha ~ normal(0, 10);
sigma_alpha ~ uniform(0, 10);
beta ~ normal(0, 10);
}
Here, I get this error message:
Warning messages:
1: There were 1 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: Examine the pairs() plot to diagnose sampling problems
Q2: How can I fix this error + is the Stan specification right?
- Hierarchical Linear Model with 2 Levels:
Well, this is actually the one which is bothering me the most.
Y_j = alpha_j + beta * X
alpha _j = gamma_0 + gamma_1 * U
I have no clue how to specificy the second layer of the hierarchical model.
Q3: How do I specify the model in Stan + is it better to have 85 obs for Uranium or is it better when each of the 919 observation gets its own uranium observation (however would be duplicates, since there is only 1 measurement per county).
Here’s the minimal code from R:
## Load Data
data("radon", package = "rstanarm")
data <- radon
# County Categorical Variable
county_name <- as.vector(data$county) # create vector of county indications for each obs
county_list <- unique(county_name) # create list of counties, drop duplicates
J <- length(county_list) # number counties in Minessota
county <- rep (NA, J)
for (i in 1:J){
county[county_name==county_list[i]] <- i
}
# Seperate variables
logradon <- data$log_radon
loguranium <- data$log_uranium
loguranium <- loguranium[!duplicated(loguranium)]
floorind <- data$floor
N <- length(logradon)
# Lists for STAN
cplist <- list(Y = logradon,
X = floorind,
N = N)
unplist <- list(Y = logradon,
X = floorind,
N = N,
J = J,
county = county)
datalist <- list(Y = logradon,
X = floorind,
U = loguranium,
N = N,
J = J,
county = county)
Exclaimer:
I actually feel bad for posting such an extensive questionnary, since usually people don’t answer to that but I am really hopeless. I tried it now for 2 weeks and I don’t know whether my results are correct or not and how to specify the 2-level HLM.
I don’t know anybody how uses Stan and most tutorials in the internet use Python.
(There are R tutorials but with different packages, where they just use default priors which is not what I want to do. I want to properly specify the full model to also perform sensitivity analysis later on regarding the priors.)
I would really appreciate any kind of help. Thanks in advance.
Here is the original paper for further reference.