QR Decomposition in hierarchical models

I am trying to fit a hierarchical model with varying slopes on standardized data using the given Stan code. I am very new to Stan (probabilistic programming in general) and currently learning, so please let me know if you have any questions or critiques about the code or parameterization below.

Assume this setting :

  • N observations
  • a design matrix X of dimensions [N \times K] , K being the number of predictors
  • a number J of groups (in my application, these are different players)
  • Pid an integer vector of length N that for each observation encodes its group, one of 1:J

I am also standardizing the data in the transform data block of the code to improve the sampling speed.


###Model specification
model_code ="""
    data {
        int<lower=1> N;   //number of training samples
        int<lower=1> J;   //number of players in group
        int<lower=1, upper=J> Pid[N]; //the player id 
        
        int<lower=1> K; //number of predictors-1 (intercept)
        matrix[N, K] X; //matrix of predictors
        vector[N] y_obs; //observation/y_train
        
        int<lower=1> N_new; //n datapoints in test
        matrix[N_new, K] X_test; //matrix of test predictors
        int<lower=1, upper=J> Pid_test[N_new]; //the player id
        
    }
    
    transformed data {
        matrix[N,K] z;
        matrix[N_new,K] z_t;
        real mean_x[K];
        real sd_x[K];
        for (k in 1:K) { 
            mean_x[k] <- mean(col(X,k)); 
            sd_x[k] <- sd(col(X,k)); 
            for (n in 1:N)  
              z[n,k] <- (X[n,k] - mean_x[k]) / sd_x[k]; 
            
            for (t in 1:N_new) 
                z_t[t,k] <- (X_test[t,k] - mean_x[k]) / sd_x[k];
          }
        }
        
    parameters{
        real alpha;
        real<lower=0> sig;
        real mu[K];
        real<lower=0> sigma[K];
        vector[K] beta[J];
    
    }

    model{
    
        sig ~ cauchy(0,5);
        alpha ~ normal(0, 100);
        for (k in 1:K){
            mu[k] ~ normal(0,100);
            sigma[k] ~ cauchy(0,5);
            for (j in 1:J){
                beta[j,k]~ normal(mu[k], sigma[k]);
            }
        }
        for (n in 1:N)
            y_obs[n] ~ normal(alpha+(z[n]*beta[Pid[n]]), sig);
        
    }
    generated quantities{
        vector[N_new] y_test;
        for (n in 1:N_new)
            y_test[n] = normal_rng(alpha + (z_t[n]*beta[Pid_test[n]]), sig) ;
        
    
    }
    """

I had two questions regarding this code snippet:

  • In this multivariate example, how do I get back the non-standardized alpha and beta coefficients? Most of the examples (for example below) I have seen are for univariate predictors. How do I calculate the non-standardized alpha in the multivariate case?
"""
...
generated quantities {
  real alpha;
  real beta;
  real<lower=0> sigma;
  alpha = sd(y) * (alpha_std - beta_std * mean(x) / sd(x))
           + mean(y);
  beta = beta_std * sd(y) / sd(x);
  sigma = sd(y) * sigma_std;
}
"""
  • If I choose to perform QR Decomposition with centered predictors instead of standardizing, how do I apply it to a hierarchical model? What changes in the way QR decompositions are created in the transform data block for a non-hierarchical model example, as given below?
"""
...
transformed data {
      matrix[N, K] Q_ast ;
      matrix[K, K] R_ast ;
      matrix[K, K] R_ast_inverse ;
      matrix[N, K] X_centered ;
      matrix[N_new, K] X_test_centered ;  
      
      
      for (k in 1:K) {
          X_centered[,k] = X[,k] - mean(X[,k]) ;
          X_test_centered[,k] = X_test[,k] - mean(X[,k]) ; 
      }
      
      Q_ast = qr_Q(X_centered)[, 1:K] * sqrt(N - 1.0) ;
      R_ast = qr_R(X_centered)[1:K, ] / sqrt(N - 1.0) ;
      R_ast_inverse = inverse(R_ast) ;
    }
...
"""

Thank you for the help.

Anyone? It would be of great help. I am trying to figure out what is the correct way of performing QR decomposition in hierarchical models.

We go over how to do that in the user’s guide: 1.2 The QR reparameterization | Stan User’s Guide

The point of the QR decomposition is that it standardizes an entire data matrix. The generated quantities block there shows how to recover the parameter vector on the original scale. So I think you just need to roll all your predictors together into a single predictor matrix.

Also you can use compound declare-define statements, so those generated quantities could look like this:

real alpha = sd(y) * (alpha_std - beta_std * mean(x) / sd(x)) + mean(y);

I’m not an expert in QR decomposition (that’s @bgoodri), but isn’t the point that it’s doing the standardization for you?

1 Like

Thank you for the reply @Bob_Carpenter. Yes you are correct, it standardizes the data matrix as well as speeds up the sampling process. And I am using the example in user’s guide to perform QR parameterization for a non-hierarchical model where β is a K-vector of coefficients. And in that case I can use the θ=R*β, to get back parameter vector on the original scale.

My question here is how do we perform this QR parameterization in case of a hierarchical model, where my β is J,K matrix with J being the number of groups/cluster I am seeing in my model.

  • Is it advisable to use QR decomposition for Hierarchical models?
  • If yes, how do I get back the β on the original scale? I can not use the θ=R*β because my β is not a K vector of coefficients, but a J,K matrix.
  • Do I need to formulate the β in a different vector or matrix form to facilitate this?

Thank you. Please let me know if you have any questions.