I am trying to fit a hierarchical model with varying slopes on standardized data using the given Stan code. I am very new to Stan (probabilistic programming in general) and currently learning, so please let me know if you have any questions or critiques about the code or parameterization below.
Assume this setting :
- N observations
- a design matrix X of dimensions [N \times K] , K being the number of predictors
- a number J of groups (in my application, these are different players)
- Pid an integer vector of length N that for each observation encodes its group, one of 1:J
I am also standardizing the data in the transform data block of the code to improve the sampling speed.
###Model specification
model_code ="""
data {
int<lower=1> N; //number of training samples
int<lower=1> J; //number of players in group
int<lower=1, upper=J> Pid[N]; //the player id
int<lower=1> K; //number of predictors-1 (intercept)
matrix[N, K] X; //matrix of predictors
vector[N] y_obs; //observation/y_train
int<lower=1> N_new; //n datapoints in test
matrix[N_new, K] X_test; //matrix of test predictors
int<lower=1, upper=J> Pid_test[N_new]; //the player id
}
transformed data {
matrix[N,K] z;
matrix[N_new,K] z_t;
real mean_x[K];
real sd_x[K];
for (k in 1:K) {
mean_x[k] <- mean(col(X,k));
sd_x[k] <- sd(col(X,k));
for (n in 1:N)
z[n,k] <- (X[n,k] - mean_x[k]) / sd_x[k];
for (t in 1:N_new)
z_t[t,k] <- (X_test[t,k] - mean_x[k]) / sd_x[k];
}
}
parameters{
real alpha;
real<lower=0> sig;
real mu[K];
real<lower=0> sigma[K];
vector[K] beta[J];
}
model{
sig ~ cauchy(0,5);
alpha ~ normal(0, 100);
for (k in 1:K){
mu[k] ~ normal(0,100);
sigma[k] ~ cauchy(0,5);
for (j in 1:J){
beta[j,k]~ normal(mu[k], sigma[k]);
}
}
for (n in 1:N)
y_obs[n] ~ normal(alpha+(z[n]*beta[Pid[n]]), sig);
}
generated quantities{
vector[N_new] y_test;
for (n in 1:N_new)
y_test[n] = normal_rng(alpha + (z_t[n]*beta[Pid_test[n]]), sig) ;
}
"""
I had two questions regarding this code snippet:
- In this multivariate example, how do I get back the non-standardized
alpha
andbeta
coefficients? Most of the examples (for example below) I have seen are for univariate predictors. How do I calculate the non-standardizedalpha
in the multivariate case?
"""
...
generated quantities {
real alpha;
real beta;
real<lower=0> sigma;
alpha = sd(y) * (alpha_std - beta_std * mean(x) / sd(x))
+ mean(y);
beta = beta_std * sd(y) / sd(x);
sigma = sd(y) * sigma_std;
}
"""
- If I choose to perform QR Decomposition with centered predictors instead of standardizing, how do I apply it to a hierarchical model? What changes in the way QR decompositions are created in the transform data block for a non-hierarchical model example, as given below?
"""
...
transformed data {
matrix[N, K] Q_ast ;
matrix[K, K] R_ast ;
matrix[K, K] R_ast_inverse ;
matrix[N, K] X_centered ;
matrix[N_new, K] X_test_centered ;
for (k in 1:K) {
X_centered[,k] = X[,k] - mean(X[,k]) ;
X_test_centered[,k] = X_test[,k] - mean(X[,k]) ;
}
Q_ast = qr_Q(X_centered)[, 1:K] * sqrt(N - 1.0) ;
R_ast = qr_R(X_centered)[1:K, ] / sqrt(N - 1.0) ;
R_ast_inverse = inverse(R_ast) ;
}
...
"""
Thank you for the help.