All parameters ~0 except the final entry in the model matrix (regardless of what that value is)

mattwilliamson13 · August 7, 2017, 6:46pm

Hi,
I’m trying to fit a hierarchical negative binomial (and eventually zero-inflated poisson) model to ~2000 datapoints using a model with ~40 predictors and the QR decomposition. I am able to get the model to fit reasonably well (Rhat, neff, and PPchecks all seem reasonable), but am finding that all of the predictors are estimated to be ~0 except for the last predictor in the model matrix (regardless of what that predictor is). I have tried reducing the number of predictors and changing the order of the predictors, but the result is always a series of highly variable estimates (centered at zero, but with high levels of uncertainty) save for the last predictor which is typically estimated much more precisely and centered around 25. It seems that all of the ‘explanatory power’ is being loaded on the last predictor (regardless of what that predictor is). Is this a problem with the way the predictors and model matrix are entered? Or are priors driving the outcome (which seems unlikely given the size of the dataset)? Any insight would be greatly appreciated.

My stan code is:

  int<lower=1> N; //the number of observations
  int<lower=1> J; //the number of states
  int<lower=1> K; //number of columns in the model matrix
  int<lower=1,upper=J> id[N]; //vector of group indices
  matrix[K,N] X; //the model matrix
  int<lower=0> y[N]; //the response variable
  real<lower=0> sdLT; //observed standard deviation
}
transformed data{
  matrix[N,K] Q = qr_Q(X')[,1:K] * N;
  matrix[K,K] R = qr_R(X')[1:K,] / N;
  matrix[K,K] R_inv = inverse(R);
  
}
parameters{
real alpha_lambda;
real<lower=0> sigma_state_lambda;
real<lower=0> phi;
vector[J] eta_state_lambda;
vector[K] beta_l_tilde;
}

transformed parameters{
vector[K] beta_l = R_inv * beta_l_tilde;
vector[J] alpha_state_lambda;
vector[N] lambda;                     // avg. number of LTs in successful states

//varying intercepts
alpha_state_lambda = eta_state_lambda * sigma_state_lambda;
    
  for(n in 1:N){  
    for(k in 1:K){
      // linear predictors with random effect
      // add fixed and random effects here 
      lambda[n] = exp(alpha_lambda + alpha_state_lambda[id[n]] + beta_l_tilde[k]*X[k,n]); 
      }
    }
  
}
model{
//priors
phi ~ gamma(10,0.01);
eta_state_lambda ~ normal(0,1);
sigma_state_lambda ~ cauchy(0, 2.5*sdLT);
alpha_lambda ~ normal(0,100);
beta_l ~ normal(0,100);


// likelihood
y ~ neg_binomial_2(lambda, phi);
}
generated quantities {
 vector[N] y_rep;
 for(n in 1:N){
  y_rep[n] = neg_binomial_2_rng(lambda[n],phi); //posterior draws to get posterior predictive checks
 }
}

And the code for running in R:


N_obs <- nrow(LT_scale) #1984
J_st <- length(unique(LT_scale$stID)) #48 states
K_mat <- ncol(LT_scale[,c(7:12, 25:30, 48:53)]) #15 for a test run
id <- LT_scale$stID #id linking observation to group
X_mat <- t(as.matrix(LT_scale[,c(7:12, 25:30, 48:53)])) #15 row x 1984 column matrix
y_lt <- LT_scale$LTActivities #outcome 
sdLT <- sd(LT_scale$LTActivities) #empirical estimate of variation in the outcome

LT_data <- list(N=N_obs, J=J_st, K=K_mat, id=id, X=X_mat, y=y_lt, sdLT=sdLT)
LT_fitNG <- stan("/Users/matthewwilliamson/ConsOppBatch/LTA_nested_NegBin.stan", data=LT_data, iter=800,chains=3, control = list(max_treedepth=15)) ```

mattwilliamson13 · August 8, 2017, 4:26pm

Here is a screenshot of the issue I’m describing

bgoodri · August 8, 2017, 4:33pm

Guessing a nearly singular design matrix X or one that includes a constant.

mattwilliamson13 · August 8, 2017, 4:36pm

Thanks Ben - There aren’t any constants in the matrix. How can I diagnose whether the matrix is singular?

mattwilliamson13 · August 8, 2017, 4:44pm

Also, does the QR decomposition effect that? Or is the singularity a function of the original data matrix?

bgoodri · August 8, 2017, 5:15pm

If X is nearly singular the last (possibly few) diagonal element(s) of R will be nearly zero, in which case the last (possibly few) diagonal element(s) of the inverse of R will be really large.

You should also make sure that you center the columns of X before doing the QR decomposition (assuming you are estimating the intercept separately).

mattwilliamson13 · August 8, 2017, 5:54pm

Thanks for that. I think I understand your description of the effect of singularity on the R matrix, but don’t see evidence of singularity in the matrix prior to QR decomp (i.e. rankMatrix returns the same number of columns that are in the dataset).
I forgot to mention that all data in X were scaled (mean 0, unit variance) before modeling (so I don’t think that centering is the issue). I found a few covariates that were highly correlated and have removed them (thought I caught them all before running model). Is there something “wrong” with the priors I’m using for the Negative Binomial? I assumed that the amount of data I have would result in the likelihood overwhelming the prior, but that may not be correct?

bgoodri · August 8, 2017, 6:03pm

Functions that try to determine the rank of a matrix numerically are not that helpful. It is possible that X is of full-rank but close to short-rank. When you have many coefficients or a complicated model generally, the number of data points is almost a complete red herring. You are typically going to have a lot of information in the data on some parameters and much less on others. Look at the (estimated) effective number of parameters produced by loo().

mattwilliamson13 · August 8, 2017, 6:07pm

To clarify, I run loo on the stanfit object?

bgoodri · August 8, 2017, 6:08pm

No. Run loo on a simulations by observations matrix of log-likelihoods evaluated over the posterior simulations. See help(loo, package = "loo").

mattwilliamson13 · August 8, 2017, 10:34pm

Ok - I’ve run loo and the p_loo value is 90.6. I’m not entirely clear what that’s counting or how to use that to identify which parts of the design matrix are causing problems. I’m sorry if this is an obvious question, I’ve not used loo before… Thanks in advance for any explanation you might have

mattwilliamson13 · August 8, 2017, 10:38pm

Also, the pointwise output from loo has 1948 rows (which matches the number of observations). Is that correct? Or should it match the number of predictors in X_mat?

bgoodri · August 8, 2017, 10:46pm

Should have N = 1948 columns.

mattwilliamson13 · August 9, 2017, 6:01am

Ok, so that is correct. I’ve read the help file and the Vehtari et al 2016 paper on loo and it is still not clear to me how to use loo to identify the components of the design matrix that are causing the singularity. Is there additional information I could provide that would be helpful? Thanks again for your help.

mattwilliamson13 · August 9, 2017, 2:10pm

I’ve now re-run the model with a much-reduced matrix and am continuing to have the same problem. I’ve attached a subset of the data (note that the number of observations has changed because there are more complete cases in the reduced dataset) that can be run with the following:


N_obs <- nrow(LT_scale)
J_st <- length(unique(LT_scale$stID))
K_mat <- ncol(LT_scale[,c(7:15)])
id <- LT_scale$stID
X_mat <- t(as.matrix(LT_scale[,c(7:15)]))
y_lt <- LT_scale$LTActivities
#offset <- LT_scale$TotPop2010
sdLT <- sd(LT_scale$LTActivities)

LT_data <- list(N=N_obs, J=J_st, K=K_mat, id=id, X=X_mat, y=y_lt, sdLT=sdLT)
LT_fitNG <- stan("/Users/matthewwilliamson/ConsOppBatch/LTA_nested_NegBin.stan", data=LT_data, iter=800,chains=3)

LTA_scale_subset.csv (497.0 KB)

Topic		Replies	Views
Single value chain for cholesky factor of correlation matrix Modeling	2	595	December 18, 2017
Scaling for poisson model Modeling	13	3437	August 9, 2017
Sampler getting stuck? Modeling	18	3746	May 26, 2019
Out of sample predictions with QR decomposition? Modeling	5	1029	December 13, 2018
Cholesky decomposition failed for precision/covariance matrix Modeling	21	5553	August 28, 2017

All parameters ~0 except the final entry in the model matrix (regardless of what that value is)

Related topics