Scaling for poisson model

noah · July 26, 2017, 6:04am

Hi, Have an awkward issue with some scaling for a poisson model.

This is a poisson model for counting student engagement.

The data are in many different ranges. Some covariates are [0,1], others are [1,20000].
In order to make Stan run smoothly, I’d like to scale the covariates to all be in the same range. Two immediate thoughts on this:

Scale each covariate to be [0,1]
We have population in the model, so scale each covariate as a percentage of population

i.e.
student_count ~ pois( b[1] * percent_hispanic + b[2] * income + b[3] * sat_score)

Both will run fine in Stan. The problem comes when trying to estimate predicted counts. As every covariate is now in the compressed range, the posterior samples for the counts are always too low. My intuition is that I need to un-scale the posterior samples in some way, but not sure how. Any suggestions on how to attack this.

bgoodri · July 26, 2017, 6:22am

Don’t change the posterior samples unless you really know what you are doing. But the predictions should be fine if your population values are scaled in the same way as in your sample.

But scaling the predictors marginally is not so optimal. It often works much better to do a QR reparameterization of the design matrix (X), which is discussed in the manual and is a strongly recommended option in rstanarm, where your model would be

library(rstanarm)
options(mc.cores = parallel::detectCores())
post <- stan_glm(student_count ~ percent_hispanic + income + sat_score, 
                 data = something, family = poisson(), QR = TRUE)

The QR reparameterzation of (X) makes the columns of Q have the same length and be orthogonal to each other. After inverting the reparameterization to obtain coefficients on X, you can do prediction in the original scale of the variables.

noah · July 26, 2017, 3:14pm

Ben,

Thanks for pointing me to the QR decomposition. Appreciate it.

I implemented it in the model, chains mix nicely, samples look good, etc. BUT, the coefficients recreated in generated quantities have a median of zero or very close to zero. That seems highly unlikely.

Stan file pasted below. This is a hurdle model for student activity counts, with a campus specific intercept. There are separate coefficients for each portion of the model (the probability of a zero, and the count if non-zero)

data {
  int <lower=0> N; // Number of rows
  int <lower=0> J; // Number of campus
  int <lower=0> K; // Number of coef
  int <lower=0> Y[N]; // Count of students
  int<lower=1, upper=J> campus[N]; // Campus labels
  matrix[N,K] X;  // Covariates
}
transformed data{
  // As per page 123 of stan language reference v15
  matrix[N, K] Q_ast;
  matrix[K, K] R_ast;
  matrix[K, K] R_ast_inverse;
  // thin and scale the QR decomposition
  Q_ast = qr_Q(X)[, 1:K] * sqrt(N - 1);
  R_ast = qr_R(X)[1:K, ] / sqrt(N - 1);
  R_ast_inverse = inverse(R_ast);
}
parameters {

  // Parameters for campus intercepts
  vector[J] theta_z_eta;
  vector[J] theta_c_eta;
  real<lower=0> theta_z_scale;
  real<lower=0> theta_c_scale;

  // For non-centered coefficient estimation
  vector[K] beta_z_eta;
  vector[K] beta_c_eta;
  real<lower=0> beta_z_scale;
  real<lower=0> beta_c_scale;

}

transformed parameters{
  vector[J] theta_z; // Campus intercept for zero prob part
  vector[J] theta_c; // Campus intercept for count part
  vector[K] beta_z;  // Coefficients for zero part
  vector[K] beta_c;  // Coefficients for count part

  theta_z = theta_z_eta * theta_z_scale;
  theta_c = theta_c_eta * theta_c_scale;

  beta_z = beta_z_eta * beta_z_scale;
  beta_c = beta_c_eta * beta_c_scale;

}

model {
  real theta;
  real lambda;
  vector[N] xb_z;
  vector[N] xb_c;

  theta_z_eta ~ normal(0,1);
  theta_z_scale ~ gamma(10,10);

  theta_c_eta ~ normal(0,1);
  theta_c_scale ~ gamma(10,10);

  beta_z_eta ~ normal(0,1);
  beta_z_scale ~ gamma(10,10);

  beta_c_eta ~ normal(0,1);
  beta_c_scale ~ gamma(10,10);

  xb_z = Q_ast * beta_z;
  xb_c = Q_ast * beta_c;

  for(n in 1:N){

    theta  =     theta_z[campus[n]] + xb_z[n];
    lambda = exp(theta_c[campus[n]] + xb_c[n]);

    if(Y[n]==0){
      target += bernoulli_logit_lpmf(1 | theta);
    }
    else{
      target += bernoulli_logit_lpmf(0 | theta) + poisson_lpmf(Y[n] | lambda) - log1m_exp(-lambda);
    }
  }
}
generated quantities{
  vector[K] beta_zero;
  vector[K] beta_count;

  beta_zero  = R_ast_inverse * beta_z;
  beta_count = R_ast_inverse * beta_c;

}

bgoodri · July 26, 2017, 4:07pm

I have no way of judging whether that is plausible or not, but I tend to not argue with posterior distributions when it appears Stan is working well. I would be looking at the posterior predictive distribution of the outcome, especially to see if the proportion of zeros is similar to that in the data.

avehtari · July 26, 2017, 9:25pm

Should you have also a common intercept?

It would be also better to use poisson_log_lpmf(theta_c[campus[n]] + xb_c[n]) as there is then no need to exponentiate.

Aki

noah · July 27, 2017, 12:25am

Thanks Aki!

I was exponentiating in order to keep the lambda value positive.

noah · July 27, 2017, 1:19pm

Strangely, no matter what I try, almost all of the coeffcients (beta_zero and beta_count) have a mean value of zero. Many are highly correlated to the dependent variable, and have reasonable values using other regression tools. This leads me to believe that I did something wrong with the QR decomposition part. Tried to follow the Stand documents exactly, but must have an error somewhere.

Does anyone see anything that might cause this? (Stan code in earlier post)

Bob_Carpenter · July 28, 2017, 3:32pm

I’d recommend simulating data where you know they’re not zero and see what happens.

If you have highly correlated predictors x1 and x2, and you have coefficients beta1 and beta2 with a symmetric prior, then you expect beta1 and beta2 to have a posterior mean of zero. Look at their sum, which won’t be expected to be zero if x1 and x2 are informative about y.

noah · July 29, 2017, 3:10am

Bob, good point.

Still think I’m doing something wrong with the QR decomposition and subsequent coefficient reconstruction in Stan.

x1 and x2 are correlated with each other at 0.25
x1 and y are correlated at -0.145
x2 and y are correlated at -0.05

However the resulting coefficients both have a range from -1e-06 to 3e07 which is effectively 0

I attempt to re-construct the “Un-QR” version of the coefficients in the generated quantities block, and my guess is that is where I am doing something wrong.

generated quantities{
  vector[K] beta_count;
  beta_count = R_ast_inverse * beta_c;
}

The beta_c coefficients for x1 and x2 seem to have reasonable ranges
beta_c[1] ranges from -0.27 to -0.68
beta_c[2] ranges from -0.22 to -0.27

Given the data and correlation between the covariates and y, those beta_c values seem “reasonable”. My guess is that somehow multiplying them by R_ast_inverse is causing the problem.

Any thoughts?

noah · July 29, 2017, 4:09am

OFFERING A $250 BOUNTY IF SOMEBODY WANTS TO FIX THIS STAN MODEL/FILE TO RUN AND GENERATE COEFFICIENTS THAT ARE “REASONABLE”.

The Catch is that I need this sorted out in the next 36 Hours. (Sunday Noon EDT)

If interested, please contact me directly. I’ll provide Stan code, data, and explanation ASAP.

(I hope this isn’t inappropriate to post here. If so, please let me know and I’ll delete this post.)

Bob_Carpenter · July 29, 2017, 6:05pm

Fine by me. We have a jobs category, too :-)

Bob_Carpenter · July 29, 2017, 6:09pm

Also, the above is equal to just qr_Q(X) as it’s size K. Not that the mild inefficiency of doing it your way will matter in transformed data.

I’d also recommend poisson_log_lpmf(Y[n] | log_lambda) rather than the poisson_lpfm(Y[n] | exp(log_lambda)) that you’re using now.

noah · July 30, 2017, 1:11am

Thanks Bob.

mattwilliamson13 · August 9, 2017, 4:30pm

Noah, were you able to solve this issue? I think the QR is causing a similar problem for me (see my post here: http://discourse.mc-stan.org/t/all-parameters-0-except-the-final-entry-in-the-model-matrix-regardless-of-what-that-value-is/1511/13

Topic		Replies	Views
Sampler getting stuck? Modeling	18	3699	May 26, 2019
Converting Scaled Parameters to Original Scale Modeling	13	1591	May 15, 2018
Out of sample predictions with QR decomposition? Modeling	5	1018	December 13, 2018
Different results using QR parameterization in Hierarchical Model Modeling fitting-issues , cmdstanr	2	511	January 25, 2021
Putting prior on the constant separately in a regression Modeling	20	684	August 13, 2021

Scaling for poisson model

Related topics