Hi!
I’m having a hard time fitting a hierarchical model with STAN (it’s my fault, not STAN’s). The problem can be summarized as follows: I have a set of individuals separated into T subgroups, each one with N individuals, and I apply a different treatment to each subgroup. For each individual, I have a set of M data pairs (x,y) and the proposed model is:
y \sim \mathcal{N}\left(\theta_0 + \theta_1.x,\sigma_y\right).
On the other hand, the link between individuals under the same treatment is established by putting priors on the parameters \theta_0 and \theta_1:
\theta_0 \sim \mathcal{N}\left(\alpha_0,\beta_0\right),
\theta_1 \sim \mathcal{N}\left(\alpha_1,\beta_1\right).
After running the sampler, the results are satisfactory in many aspects, but it happens that the posterior distribution of \beta_1 is very concentrated near the value 0, and therefore each set of N posterior samples of of \theta_1 (one per individual, given a sample of \left(\alpha_1,\beta_1\right) for a particular treatment) lies very close to the sampled value of \alpha_1. At first, I thought this simply implies that there are no appreciable differences in the values of \theta_1 for individuals under the same treatment, since the value of \beta_1 defines the dispersion of this parameter. However, after reviewing the traceplots I notice that the trace of \beta_1 seems to “bounce” off the value 0. Is this a problem with the model, or is it something that normally occurs when there are no differences between the values of \theta_1?
I greatly appreciate any help you can give me.
I’m using the interface cmdstanpy
in Python 3.9. Here’s the code:
functions {
matrix filter_nan(vector x,vector y,int K) { // Returns input vectors after deleting values where y is NaN
int N = size(x); // Size of input vectors
matrix[2,K] xy; // Output matrix
int j = 0; // Auxiliar counter
for(i in 1:N) { // Loop over all input values
if(is_nan(y[i])) { // Ignore NaNs
}
else {
j += 1; // Increase auxiliar counter
xy[1,j] = x[i]; // Store x value
xy[2,j] = y[i]; // Store y value
}
}
return xy;
}
}
data {
int<lower=0> T; // Number of treatments
int<lower=0> N; // Number of individuals per treatment
int<lower=0> M; // Number of data points per individual
vector[M] x; // Explanatory variable
real y[T,N,M]; // Response variable
int K[T,N]; // Number of not-NaN values for each individual
real mean_x; // Mean value of variable x
real sd_x; // Standard deviation of variable x
real mean_y; // Mean value of variable y
real sd_y; // Standard deviation of variable y
}
transformed data {
vector[M] x_std = (x - mean_x)/sd_x; // Standardized explanatory variable
}
parameters {
// Hyperparameters
vector[T] alpha_0; // Mean of theta_0 distribution
vector[T] alpha_1; // Mean of theta_1 distribution
real<lower=0> beta_0[T]; // Standard deviation of theta_0 distribution
real<lower=0> beta_1[T]; // Standard deviation of theta_1 distribution
// Parameters
matrix[T,N] theta_0_raw; // Standardized y-intercept of linear model, after reparametrization
matrix[T,N] theta_1_raw; // Standardized slope of linear model, after reparametrization
real<lower=0> sigma_y; // Standard deviation of linear model (same for all individuals and all treatments)
}
transformed parameters {
matrix[T,N] theta_0; // Standardized y-intercept of linear model, before reparametrization
matrix[T,N] theta_1; // Standardized slope of linear model, before reparametrization
for (i in 1:T) {
theta_0[i] = alpha_0[i] + beta_0[i]*theta_0_raw[i]; // Reparametrization of theta_0
theta_1[i] = alpha_1[i] + beta_1[i]*theta_1_raw[i]; // Reparametrization of theta_1
}
}
model {
// Hyperpriors
alpha_0 ~ normal(0,1); // Prior of hyperparameter alpha_0
alpha_1 ~ normal(0,1); // Prior of hyperparameter alpha_1
beta_0 ~ normal(0,1); // Prior of hyperparameter beta_0
beta_1 ~ normal(0,1); // Prior of hyperparameter beta_1
// Priors
for (i in 1:T) {
theta_0_raw[i] ~ std_normal(); // Prior of parameter theta_0_raw
theta_1_raw[i] ~ std_normal(); // Prior of parameter theta_1_raw
}
sigma_y ~ normal(0,1); // Prior of parameter sigma_y_std
// Likelihood
for(i in 1:T) { // Treatments loop
for(j in 1:N) { // Individuals loop
if(K[i,j] != 0) {
matrix[2,K[i,j]] xy = filter_nan(x_std,to_vector(y[i,j]),K[i,j]); // Filter NaNs
vector[K[i,j]] y_std = to_vector((xy[2] - mean_y)/sd_y); // Standardized response variable
vector[K[i,j]] mu_y_std = theta_0[i,j] + theta_1[i,j]*to_vector(xy[1]); // Linear model
y_std ~ normal(mu_y_std,sigma_y); // Likelihood (normal)
}
}
}
}
I based my choice of the prior of \beta_1 on the guidelines given in the STAN documentation on GitHub. I also tested a \text{Gamma}(2,0.1) as suggested in the STAN’s User Guide, but the issue persists.
Here is the traceplot (result of trace_plot()
method of package arviz
; the rows, in order, correspond to \alpha_0, \alpha_1, \beta_0, \beta_1, \theta_{0,raw}, \theta_{1,raw}, \sigma_y, \theta_0, \theta_1):