# Out of sample predictions with QR decomposition?

I’m trying to learn how to do out of sample predictions in a model that I fit using the QR decomposition. This is the code that I wrote:

data {
int<lower = 0, upper = 1> run_estimation;       // a switch to evaluate the likelihood
int<lower=0> N ;                                // number of data items
int<lower=0> K ;                                // number of predictors
matrix[N, K] x ;                                // predictor matrix
vector[N]    y ;                                // outcome vector
int<lower=0> pred_N ;                                // number of data items
matrix[N, K] pred_x ;                                // out of sample predictor matrix

}

transformed data {
matrix[N, K] Q_ast ;
matrix[K, K] R_ast ;
matrix[K, K] R_ast_inverse ;
matrix[N, K] X_centered ;
matrix[pred_N, K] pred_X_centered ;
real mean_y = mean(y) ;
real sd_y = sd(y) ;
vector[N] y_std;
for (n in 1:N) y_std[n] = (y[n] - mean_y)/sd_y ;
for (k in 1:K) {
X_centered[,k] = x[,k] - mean(x[,k]) ;
pred_X_centered[,k] = pred_x[,k] - mean(x[,k]) ; // SHOULD I BE CENTERING with the mean of x instead of pred_x ??
}
// thin and scale the QR decomposition
Q_ast = qr_Q(X_centered)[, 1:K] * sqrt(N - 1.0) ;
R_ast = qr_R(X_centered)[1:K, ] / sqrt(N - 1.0) ;
R_ast_inverse = inverse(R_ast) ;
}

parameters {
real alpha_ast ;          // intercept
vector[K] theta ;     // coefficients on Q_ast
vector<lower=0>[K] lambda ; //
real<lower=0> tau ;
real<lower=0> sigma_ast ; // error scale
}

model {
if(run_estimation==1) y_std ~ normal(alpha_ast + Q_ast * theta , sigma_ast) ; // likelihood
lambda ~ cauchy(0,1) ;
tau ~ cauchy(0,1) ;
for (i in 1:K) theta[i] ~ normal(0, lambda[i] * tau) ;
alpha_ast ~ normal(0,1) ;
sigma_ast ~ normal(0,1) ;
}

generated quantities {
vector[K] beta ;
vector[N] y_sim ;
vector[pred_N] y_pred ;
real alpha ;
real<lower=0> sigma ;
alpha = alpha_ast*sd_y + mean_y ;
beta = R_ast_inverse * theta *sd_y ; // coefficients on x
sigma = sigma_ast*sd_y ;
for(n in 1:N) y_sim[n] = normal_rng(alpha_ast +  Q_ast[n,] * theta , sigma_ast)  * sd_y + mean_y ;
for(n in 1:pred_N) y_pred[n] = normal_rng(alpha +  //IS THIS RIGHT???
pred_X_centered[n,] * beta ,
sigma);
}



Is what I’m doing to calculate y_pred right?? For example, should I be using pred_x or pred_X_centered?

Thanks for the help!

1 Like

It seems bit wierd to me that you standardize your dependent variable y. Do you really need that? (You don’t need it for the QR decomposition…)

Apart from that it looks fine to me. Do you get unexpected results?

PS: you do the QR on the centered X, so after “reverting” back to \beta via R^{*-1} you still need to use centered X for predictions (as you already do). :)

1 Like

I’m actually having troubles with a zero inflated Poisson model, but I thought that starting was a simpler question was a good idea. In this model I’m getting this type of error after i added code for out of sample prediction:

Chain 1: Exception: model4004405c442f_zeroinflated_namespace::write_array: y_pred[k0__] is -2147483648, but must be greater than or equal to 0  (in 'model4004405c442f_zeroinflated' at line 113)


This is my stan code


data{
int<lower=0> N;
int<lower=0> N_sch;
int<lower=0> N_cluster;
int<lower=1,upper=N_sch> idx_sch[N];
int<lower=1,upper=N_cluster> idx_cluster[N];
int<lower=0> K_logit;
matrix[N, K_logit] X_logit;
int<lower=0> K_poisson;
matrix[N, K_poisson] X_poisson;
int<lower=0> offset[N];
int<lower=0> y[N];
int<lower=0> pred_N;
int<lower=1,upper=N_sch> pred_idx_sch[pred_N];
int<lower=1,upper=N_cluster> pred_idx_cluster[pred_N];
matrix[pred_N, K_logit] pred_X_logit;
matrix[pred_N, K_poisson] pred_X_poisson;
int<lower=0> pred_offset[pred_N];
}

transformed data {
matrix[N, K_logit] Q_logit_ast ;
matrix[K_logit, K_logit] R_logit_ast ;
matrix[K_logit, K_logit] R_logit_ast_inverse ;
matrix[N, K_logit] X_logit_centered ;
matrix[N, K_poisson] Q_poisson_ast ;
matrix[K_poisson, K_poisson] R_poisson_ast ;
matrix[K_poisson, K_poisson] R_poisson_ast_inverse ;
matrix[N, K_poisson] X_poisson_centered ;
matrix[pred_N, K_poisson] pred_X_poisson_centered ;
matrix[pred_N, K_logit] pred_X_logit_centered ;
for (k in 1:K_logit) {
X_logit_centered[,k] = X_logit[,k] - mean(X_logit[,k]) ;
pred_X_logit_centered[,k] = pred_X_logit[,k] -
mean(X_logit[,k]) ; // NOTE THAT I'M CENTERING USING X_logit instead of pred
}
for (k in 1:K_poisson) {
X_poisson_centered[,k] = X_poisson[,k] -
mean(X_poisson[,k]) ;
pred_X_poisson_centered[,k] = pred_X_poisson[,k] -
mean(X_poisson[,k]) ;  // NOTE THAT I'M CENTERING USING X_poisson instead of pred

}
// thin and scale the QR decomposition
Q_logit_ast = qr_Q(X_logit_centered)[, 1:K_logit] * sqrt(N - 1.0) ;
R_logit_ast = qr_R(X_logit_centered)[1:K_logit, ] / sqrt(N - 1.0) ;
R_logit_ast_inverse = inverse(R_logit_ast) ;
Q_poisson_ast = qr_Q(X_poisson_centered)[, 1:K_poisson] * sqrt(N - 1.0) ;
R_poisson_ast = qr_R(X_poisson_centered)[1:K_poisson, ] / sqrt(N - 1.0) ;
R_poisson_ast_inverse = inverse(R_poisson_ast) ;
}

parameters {
real alpha_logit_ast ;                    // intercept
vector[K_logit] theta_logit ;                   // coefficients on Q_ast
vector[N_sch] sch_logit_std;              //
real<lower=0> sigma_sch_logit;            //
vector[N_cluster] cluster_logit_std;            //
real<lower=0> sigma_cluster_logit;           //
real alpha_poisson_ast ;                    // intercept
vector[K_poisson] theta_poisson ;                   // coefficients on Q_ast
vector[N_sch] sch_poisson_std;              //
real<lower=0> sigma_sch_poisson;            //
vector[N_cluster] cluster_poisson_std;            //
real<lower=0> sigma_cluster_poisson;           //
}

transformed parameters {
vector[N_sch] sch_logit;
vector[N_cluster] cluster_logit;
vector[N_sch] sch_poisson;
vector[N_cluster] cluster_poisson;
vector[N] lp;
vector[N] lambda_log;
sch_logit = sigma_sch_logit * sch_logit_std ;               // Matt trick
cluster_logit = sigma_cluster_logit * cluster_logit_std ;            // Matt trick
sch_poisson = sigma_sch_poisson * sch_poisson_std ;               // Matt trick
cluster_poisson = sigma_cluster_poisson * cluster_poisson_std ;            // Matt trick
lp = alpha_logit_ast + Q_logit_ast * theta_logit + sch_logit[idx_sch] + cluster_logit[idx_cluster];
lambda_log = (alpha_poisson_ast + Q_poisson_ast * theta_poisson + sch_poisson[idx_sch] + cluster_poisson[idx_cluster])/50;
}

model{
theta_logit ~ normal(0,1) ;
alpha_logit_ast ~ normal(0,1) ;
sch_logit_std ~ normal(0,1) ;
sigma_sch_logit ~ normal(0,1) ;
cluster_logit_std ~ normal(0,1) ;
sigma_cluster_logit ~ normal(0,1) ;
theta_poisson ~ normal(0,50) ;  // normal(0,1)
alpha_poisson_ast ~ normal(0,50) ;  // normal(0,1)
sch_poisson_std ~ normal(0,1) ;
sigma_sch_poisson ~ normal(0,50) ;
cluster_poisson_std ~ normal(0,1) ;
sigma_cluster_poisson ~ normal(0,50) ;

for(n in 1:N){
if(y[n] == 0)
target += log_sum_exp(bernoulli_logit_lpmf(1 | lp[n]),
bernoulli_logit_lpmf(0 | lp[n])
+ poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n])));
else
target += bernoulli_logit_lpmf(0 | lp[n])
+ poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n]));
}

}

generated quantities {
vector[N] log_lik;
int<lower=0>  y_sim[N] ;
int zero;
int<lower=0>  y_pred[N] ;
int pred_zero;
vector[pred_N] pred_lambda_log;
vector[pred_N] pred_lp;
//beta = R_ast_inverse * theta *sd_y ;
//pred_theta_poisson = R_poisson_ast_inverse * theta_poisson ;
pred_lambda_log = (alpha_poisson_ast +
pred_X_poisson_centered * (R_poisson_ast_inverse * theta_poisson) +
sch_poisson[pred_idx_sch] +
cluster_poisson[pred_idx_cluster])/50;
pred_lp = alpha_logit_ast +
pred_X_logit_centered * (R_logit_ast_inverse * theta_logit) +
sch_logit[pred_idx_sch] +
cluster_logit[pred_idx_cluster];

for(n in 1:N){
zero=bernoulli_logit_rng(lp[n]);
if(zero==1)
y_sim[n] = 0 ;
else
if(lambda_log[n] + log(offset[n])<20.7944) /// THIS HELPS ME DEBUG, NO CLUE IF THERE IS A BETTER WAY.
y_sim[n]=poisson_log_rng(lambda_log[n] + log(offset[n]));
else
y_sim[n]=999999;
if(y[n] == 0)
log_lik[n] = log_sum_exp(bernoulli_logit_lpmf(1 | lp[n]),
bernoulli_logit_lpmf(0 | lp[n])
+ poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n]))) ;
else
log_lik[n] = bernoulli_logit_lpmf(0 | lp[n])
+ poisson_log_lpmf(y[n] | lambda_log[n] + log(offset[n]));
}

for(n in 1:pred_N){
pred_zero=bernoulli_logit_rng(pred_lp[n]);
if(pred_zero==1)
y_pred[n] = 0 ;
else
if(pred_lambda_log[n] + log(pred_offset[n])<20.7944)
y_pred[n]=poisson_log_rng(pred_lambda_log[n] +
log(pred_offset[n]));
else
y_pred[n]=999999;
}
}



I don’t understand how the code that I have can produce negative numbers. In case it helps this is the histogram for pred_offset

What am I missing?

Hm, hard to tell… Are all offset and pred_offset greater or equal to 1? (Can’t really see that from the histogram.)

They are

summary(pred.df\$n_i)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.00    4.00    9.00   12.09   18.00   61.00


I’m trying to simplify my model because i really don’t get how a poisson rng can generate negative numbers…

Yes, I’m not sure what’s going on there…