Hi,
I’m trying to fit a hierarchical negative binomial (and eventually zero-inflated poisson) model to ~2000 datapoints using a model with ~40 predictors and the QR decomposition. I am able to get the model to fit reasonably well (Rhat, neff, and PPchecks all seem reasonable), but am finding that all of the predictors are estimated to be ~0 except for the last predictor in the model matrix (regardless of what that predictor is). I have tried reducing the number of predictors and changing the order of the predictors, but the result is always a series of highly variable estimates (centered at zero, but with high levels of uncertainty) save for the last predictor which is typically estimated much more precisely and centered around 25. It seems that all of the ‘explanatory power’ is being loaded on the last predictor (regardless of what that predictor is). Is this a problem with the way the predictors and model matrix are entered? Or are priors driving the outcome (which seems unlikely given the size of the dataset)? Any insight would be greatly appreciated.
My stan code is:
int<lower=1> N; //the number of observations
int<lower=1> J; //the number of states
int<lower=1> K; //number of columns in the model matrix
int<lower=1,upper=J> id[N]; //vector of group indices
matrix[K,N] X; //the model matrix
int<lower=0> y[N]; //the response variable
real<lower=0> sdLT; //observed standard deviation
}
transformed data{
matrix[N,K] Q = qr_Q(X')[,1:K] * N;
matrix[K,K] R = qr_R(X')[1:K,] / N;
matrix[K,K] R_inv = inverse(R);
}
parameters{
real alpha_lambda;
real<lower=0> sigma_state_lambda;
real<lower=0> phi;
vector[J] eta_state_lambda;
vector[K] beta_l_tilde;
}
transformed parameters{
vector[K] beta_l = R_inv * beta_l_tilde;
vector[J] alpha_state_lambda;
vector[N] lambda; // avg. number of LTs in successful states
//varying intercepts
alpha_state_lambda = eta_state_lambda * sigma_state_lambda;
for(n in 1:N){
for(k in 1:K){
// linear predictors with random effect
// add fixed and random effects here
lambda[n] = exp(alpha_lambda + alpha_state_lambda[id[n]] + beta_l_tilde[k]*X[k,n]);
}
}
}
model{
//priors
phi ~ gamma(10,0.01);
eta_state_lambda ~ normal(0,1);
sigma_state_lambda ~ cauchy(0, 2.5*sdLT);
alpha_lambda ~ normal(0,100);
beta_l ~ normal(0,100);
// likelihood
y ~ neg_binomial_2(lambda, phi);
}
generated quantities {
vector[N] y_rep;
for(n in 1:N){
y_rep[n] = neg_binomial_2_rng(lambda[n],phi); //posterior draws to get posterior predictive checks
}
}
And the code for running in R:
N_obs <- nrow(LT_scale) #1984
J_st <- length(unique(LT_scale$stID)) #48 states
K_mat <- ncol(LT_scale[,c(7:12, 25:30, 48:53)]) #15 for a test run
id <- LT_scale$stID #id linking observation to group
X_mat <- t(as.matrix(LT_scale[,c(7:12, 25:30, 48:53)])) #15 row x 1984 column matrix
y_lt <- LT_scale$LTActivities #outcome
sdLT <- sd(LT_scale$LTActivities) #empirical estimate of variation in the outcome
LT_data <- list(N=N_obs, J=J_st, K=K_mat, id=id, X=X_mat, y=y_lt, sdLT=sdLT)
LT_fitNG <- stan("/Users/matthewwilliamson/ConsOppBatch/LTA_nested_NegBin.stan", data=LT_data, iter=800,chains=3, control = list(max_treedepth=15)) ```