Out of sample predictions with QR decomposition?

I’m trying to learn how to do out of sample predictions in a model that I fit using the QR decomposition. This is the code that I wrote:

data {
  int<lower = 0, upper = 1> run_estimation;       // a switch to evaluate the likelihood
  int<lower=0> N ;                                // number of data items
  int<lower=0> K ;                                // number of predictors
  matrix[N, K] x ;                                // predictor matrix
  vector[N]    y ;                                // outcome vector
  int<lower=0> pred_N ;                                // number of data items
  matrix[N, K] pred_x ;                                // out of sample predictor matrix
  
}

transformed data {
  matrix[N, K] Q_ast ;
  matrix[K, K] R_ast ;
  matrix[K, K] R_ast_inverse ;
  matrix[N, K] X_centered ;
  matrix[pred_N, K] pred_X_centered ;   
  real mean_y = mean(y) ;
  real sd_y = sd(y) ;
  vector[N] y_std;
  for (n in 1:N) y_std[n] = (y[n] - mean_y)/sd_y ;
  for (k in 1:K) {
      X_centered[,k] = x[,k] - mean(x[,k]) ;
      pred_X_centered[,k] = pred_x[,k] - mean(x[,k]) ; // SHOULD I BE CENTERING with the mean of x instead of pred_x ??
  }
  // thin and scale the QR decomposition
  Q_ast = qr_Q(X_centered)[, 1:K] * sqrt(N - 1.0) ;
  R_ast = qr_R(X_centered)[1:K, ] / sqrt(N - 1.0) ;
  R_ast_inverse = inverse(R_ast) ;
}

parameters {
  real alpha_ast ;          // intercept
  vector[K] theta ;     // coefficients on Q_ast
  vector<lower=0>[K] lambda ; //
  real<lower=0> tau ;
  real<lower=0> sigma_ast ; // error scale
}

model {
  if(run_estimation==1) y_std ~ normal(alpha_ast + Q_ast * theta , sigma_ast) ; // likelihood
  lambda ~ cauchy(0,1) ;
  tau ~ cauchy(0,1) ;
  for (i in 1:K) theta[i] ~ normal(0, lambda[i] * tau) ;  
  alpha_ast ~ normal(0,1) ;
  sigma_ast ~ normal(0,1) ;
}

generated quantities {
  vector[K] beta ;
  vector[N] y_sim ;
  vector[pred_N] y_pred ;
  real alpha ;
  real<lower=0> sigma ;
  alpha = alpha_ast*sd_y + mean_y ;
  beta = R_ast_inverse * theta *sd_y ; // coefficients on x
  sigma = sigma_ast*sd_y ;
  for(n in 1:N) y_sim[n] = normal_rng(alpha_ast +  Q_ast[n,] * theta , sigma_ast)  * sd_y + mean_y ;
  for(n in 1:pred_N) y_pred[n] = normal_rng(alpha +  //IS THIS RIGHT???
                                              pred_X_centered[n,] * beta , 
                                              sigma);
}


Is what I’m doing to calculate y_pred right?? For example, should I be using pred_x or pred_X_centered?

Thanks for the help!

1 Like

It seems bit wierd to me that you standardize your dependent variable y. Do you really need that? (You don’t need it for the QR decomposition…)

Apart from that it looks fine to me. Do you get unexpected results?

PS: you do the QR on the centered X, so after “reverting” back to \beta via R^{*-1} you still need to use centered X for predictions (as you already do). :)

1 Like

I’m actually having troubles with a zero inflated Poisson model, but I thought that starting was a simpler question was a good idea. In this model I’m getting this type of error after i added code for out of sample prediction:

Chain 1: Exception: model4004405c442f_zeroinflated_namespace::write_array: y_pred[k0__] is -2147483648, but must be greater than or equal to 0  (in 'model4004405c442f_zeroinflated' at line 113)

This is my stan code


data{
  int<lower=0> N;
  int<lower=0> N_sch;
  int<lower=0> N_cluster;
  int<lower=1,upper=N_sch> idx_sch[N];
  int<lower=1,upper=N_cluster> idx_cluster[N];
  int<lower=0> K_logit; 
  matrix[N, K_logit] X_logit; 
  int<lower=0> K_poisson; 
  matrix[N, K_poisson] X_poisson; 
  int<lower=0> offset[N];
  int<lower=0> y[N];
  int<lower=0> pred_N;
  int<lower=1,upper=N_sch> pred_idx_sch[pred_N];
  int<lower=1,upper=N_cluster> pred_idx_cluster[pred_N];
  matrix[pred_N, K_logit] pred_X_logit; 
  matrix[pred_N, K_poisson] pred_X_poisson; 
  int<lower=0> pred_offset[pred_N];
}

transformed data {
  matrix[N, K_logit] Q_logit_ast ;
  matrix[K_logit, K_logit] R_logit_ast ;
  matrix[K_logit, K_logit] R_logit_ast_inverse ;
  matrix[N, K_logit] X_logit_centered ;
  matrix[N, K_poisson] Q_poisson_ast ;
  matrix[K_poisson, K_poisson] R_poisson_ast ;
  matrix[K_poisson, K_poisson] R_poisson_ast_inverse ;
  matrix[N, K_poisson] X_poisson_centered ;
  matrix[pred_N, K_poisson] pred_X_poisson_centered ;
  matrix[pred_N, K_logit] pred_X_logit_centered ;
  for (k in 1:K_logit) {
    X_logit_centered[,k] = X_logit[,k] - mean(X_logit[,k]) ;
    pred_X_logit_centered[,k] = pred_X_logit[,k] - 
                                  mean(X_logit[,k]) ; // NOTE THAT I'M CENTERING USING X_logit instead of pred
  }
  for (k in 1:K_poisson) {
    X_poisson_centered[,k] = X_poisson[,k] - 
                              mean(X_poisson[,k]) ;
    pred_X_poisson_centered[,k] = pred_X_poisson[,k] - 
                                    mean(X_poisson[,k]) ;  // NOTE THAT I'M CENTERING USING X_poisson instead of pred
    
  }
  // thin and scale the QR decomposition
  Q_logit_ast = qr_Q(X_logit_centered)[, 1:K_logit] * sqrt(N - 1.0) ;
  R_logit_ast = qr_R(X_logit_centered)[1:K_logit, ] / sqrt(N - 1.0) ;
  R_logit_ast_inverse = inverse(R_logit_ast) ;
  Q_poisson_ast = qr_Q(X_poisson_centered)[, 1:K_poisson] * sqrt(N - 1.0) ;
  R_poisson_ast = qr_R(X_poisson_centered)[1:K_poisson, ] / sqrt(N - 1.0) ;
  R_poisson_ast_inverse = inverse(R_poisson_ast) ;
}

parameters {
  real alpha_logit_ast ;                    // intercept
  vector[K_logit] theta_logit ;                   // coefficients on Q_ast
  vector[N_sch] sch_logit_std;              //
  real<lower=0> sigma_sch_logit;            // 
  vector[N_cluster] cluster_logit_std;            //
  real<lower=0> sigma_cluster_logit;           // 
  real alpha_poisson_ast ;                    // intercept
  vector[K_poisson] theta_poisson ;                   // coefficients on Q_ast
  vector[N_sch] sch_poisson_std;              //
  real<lower=0> sigma_sch_poisson;            // 
  vector[N_cluster] cluster_poisson_std;            //
  real<lower=0> sigma_cluster_poisson;           // 
}

transformed parameters {
  vector[N_sch] sch_logit; 
  vector[N_cluster] cluster_logit; 
  vector[N_sch] sch_poisson; 
  vector[N_cluster] cluster_poisson; 
  vector[N] lp; 
  vector[N] lambda_log; 
  sch_logit = sigma_sch_logit * sch_logit_std ;               // Matt trick
  cluster_logit = sigma_cluster_logit * cluster_logit_std ;            // Matt trick
  sch_poisson = sigma_sch_poisson * sch_poisson_std ;               // Matt trick
  cluster_poisson = sigma_cluster_poisson * cluster_poisson_std ;            // Matt trick
  lp = alpha_logit_ast + Q_logit_ast * theta_logit + sch_logit[idx_sch] + cluster_logit[idx_cluster];
  lambda_log = (alpha_poisson_ast + Q_poisson_ast * theta_poisson + sch_poisson[idx_sch] + cluster_poisson[idx_cluster])/50;
}

model{
  theta_logit ~ normal(0,1) ;  
  alpha_logit_ast ~ normal(0,1) ;
  sch_logit_std ~ normal(0,1) ;
  sigma_sch_logit ~ normal(0,1) ;
  cluster_logit_std ~ normal(0,1) ;
  sigma_cluster_logit ~ normal(0,1) ;
  theta_poisson ~ normal(0,50) ;  // normal(0,1)
  alpha_poisson_ast ~ normal(0,50) ;  // normal(0,1)
  sch_poisson_std ~ normal(0,1) ;
  sigma_sch_poisson ~ normal(0,50) ;
  cluster_poisson_std ~ normal(0,1) ;
  sigma_cluster_poisson ~ normal(0,50) ;

  for(n in 1:N){
    if(y[n] == 0)
      target += log_sum_exp(bernoulli_logit_lpmf(1 | lp[n]),
                            bernoulli_logit_lpmf(0 | lp[n])
                              + poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n])));
    else
      target += bernoulli_logit_lpmf(0 | lp[n])
                  + poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n]));
  }
  
}

generated quantities {
  vector[N] log_lik;
  int<lower=0>  y_sim[N] ;
  int zero;
  int<lower=0>  y_pred[N] ;
  int pred_zero;
  vector[pred_N] pred_lambda_log; 
  vector[pred_N] pred_lp; 
  //beta = R_ast_inverse * theta *sd_y ;
  //pred_theta_poisson = R_poisson_ast_inverse * theta_poisson ;
  pred_lambda_log = (alpha_poisson_ast + 
                      pred_X_poisson_centered * (R_poisson_ast_inverse * theta_poisson) + 
                      sch_poisson[pred_idx_sch] + 
                      cluster_poisson[pred_idx_cluster])/50;
  pred_lp = alpha_logit_ast + 
              pred_X_logit_centered * (R_logit_ast_inverse * theta_logit) + 
              sch_logit[pred_idx_sch] + 
              cluster_logit[pred_idx_cluster];

  for(n in 1:N){
	     zero=bernoulli_logit_rng(lp[n]);
	     if(zero==1)
	      y_sim[n] = 0 ;
	     else
	     if(lambda_log[n] + log(offset[n])<20.7944) /// THIS HELPS ME DEBUG, NO CLUE IF THERE IS A BETTER WAY.
	      y_sim[n]=poisson_log_rng(lambda_log[n] + log(offset[n]));
	     else
	      y_sim[n]=999999;
	     if(y[n] == 0)
	      log_lik[n] = log_sum_exp(bernoulli_logit_lpmf(1 | lp[n]),
                            bernoulli_logit_lpmf(0 | lp[n])
                              + poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n]))) ;
       else
        log_lik[n] = bernoulli_logit_lpmf(0 | lp[n])
                      + poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n]));
  }
  
  for(n in 1:pred_N){
	     pred_zero=bernoulli_logit_rng(pred_lp[n]);
	     if(pred_zero==1)
	      y_pred[n] = 0 ;
	     else
	     if(pred_lambda_log[n] + log(pred_offset[n])<20.7944)
	      y_pred[n]=poisson_log_rng(pred_lambda_log[n] + 
	                                  log(pred_offset[n]));
	     else
	      y_pred[n]=999999; 
  }
}


I don’t understand how the code that I have can produce negative numbers. In case it helps this is the histogram for pred_offset

image

What am I missing?

Hm, hard to tell… Are all offset and pred_offset greater or equal to 1? (Can’t really see that from the histogram.)

They are

summary(pred.df$n_i)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    4.00    9.00   12.09   18.00   61.00 

I’m trying to simplify my model because i really don’t get how a poisson rng can generate negative numbers…

Yes, I’m not sure what’s going on there…