Prior on R2

ermeel · March 30, 2019, 9:29am

Hi Stanimals,

I am trying to follow the logic and derivation behind the idea to put a prior on R^2 for linear regression models. I am trying to write a step by step derivation to fully comprehend it.

I have to admit that I struggle a bit with the current description/derivation in the prior section of this vignette for rstanarm. In particular the identity of \theta relating it to \rho_k. I see how this works for the single variable case (best linear predictor interpretation of OLS) but not for the multiple regression case…

Is there an extended version of the vignette available somewhere, or any relevant literature pointers or derivations?

Otherwise I would ask my questions here…

@jonah @bgoodri?

bgoodri · March 30, 2019, 1:40pm

\boldsymbol{\theta} is a vector of coefficients between the outcome and the columns of \mathbf{Q}, which are orthogonal to each other. So, the coefficient for each is just a correlation times a ratio of the standard deviation of the outcome to the standard deviation of that column of \mathbf{Q} (which is \frac{1}{\sqrt{N - 1}}).

hhau · March 30, 2019, 4:56pm

The first part of section 2.1 in this paper might be helpful?

ermeel · March 31, 2019, 5:54pm

Thanks. Just another related question: Does centering in \mathbf{X} in imply that \mathbf{Q} is centered (in addition to having uncorrelated columns)? If so how can you show that easily? This is the only missing step I need for the derivation to be completed.

bgoodri · March 31, 2019, 7:30pm

If \mathbf{X} without an intercept has centered columns, the Q factor in its QR decomposition is the same as a Q factor — after its first column has been removed — in the QR decomposition of a design matrix with an intercept in the first column and no centering. And the remaining columns will have mean zero in order to be orthogonal to the removed column.

ermeel · June 20, 2019, 5:49pm

Thanks @bgoodri.

Is it possible to get a self-contained example (Stan and R code) to see what’s actually implemented? I gave it a try, but I am sure this it not yet what you describe in the vignette and above.

data {
    int<lower=1> N;
    int<lower=1> K;
    matrix[N,K] X;
    vector[N] y;
    real<lower=0> eta;
    real<lower=0> s_y;
}
transformed data {
    matrix[N,K] Q = qr_thin_Q(X);
    matrix[K,K] R = qr_thin_R(X);
    matrix[K,K] R_inv = inverse(R);
}
parameters {
    cholesky_factor_corr[K+1] L;
    real<lower=0> omega;
}
transformed parameters {
    matrix[K+1,K+1] corr = L*L';
    vector[K] rho = corr[2:,1];
    real<lower=0> sigma_y = omega*s_y;
    vector[K] theta = sqrt(N-1)*sigma_y * rho;
    real R2 = dot_product(rho,rho);
    real<lower=0> sigma=omega*s_y*sqrt(1-R2);
}
model {
    L ~ lkj_corr_cholesky(eta);
    target += -log(square(omega));
    y ~ normal(Q*theta, sigma);
}
generated quantities {
    vector[K] beta = R_inv * theta;
}

and here some R-code:

library(rstan)
data("clouds", package = "HSAUR3")

ols <- lm(rainfall ~ seeding * (sne + cloudcover + prewetness + echomotion) +
            time, data = clouds)
X <- model.matrix(ols) 
X <- X[,2:ncol(X)]
X_c <- scale(X, scale=F)
y<- clouds$rainfall
stan_data <- list(
  
  N= nrow(X_c),
  K= ncol(X_c),
  y = y,
  X=X_c,
  eta=ncol(X_c)/2,
  s_y=sd(y)
)
fit <- stan("~/Desktop/prior_r2.stan", data=stan_data, chains=4, cores=1, seed=42)

Any help is appreciated.

bgoodri · June 20, 2019, 6:34pm

The Stan code for that is fairly self-contained

github.com

stan-dev/rstanarm/blob/master/src/stan_files/lm.stan#L89


      if (prior_dist_for_intercept == 0)       // no information
        alpha[j] = z_alpha[j];
      else if (prior_scale_for_intercept == 0) // central limit theorem
        alpha[j] = z_alpha[j] * Delta_y * sqrt_inv_N[j] + prior_mean_for_intercept;
      else                                     // arbitrary informative prior
         alpha[j] = z_alpha[j] * prior_scale_for_intercept + 
                     prior_mean_for_intercept;
    }
  }
}
model {
  if (prior_PD == 0) for (j in 1:J) { // likelihood contribution for each group
    real shift = dot_product(xbarR_inv[j], theta[j]);
    real dummy = ll_mvn_ols_qr_lp(theta[j], Rb[j], 
                                  has_intercept == 1 ? alpha[j] + shift : shift,
                                  ybar[j], SSR[j], sigma[j], N[j]);
    // implicit: u[j] is uniform on the surface of a hypersphere
  }
  if (has_intercept == 1 && prior_dist_for_intercept > 0) 
    target += normal_lpdf(z_alpha | 0, 1);
  if (prior_dist == 1) {

The R code less so but most of the action is actually in stan_biglm.fit

github.com

stan-dev/rstanarm/blob/master/R/stan_biglm.fit.R

# Part of the rstanarm package for estimating model parameters
# Copyright (C) 2016, 2017 Trustees of Columbia University
# 
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 3
# of the License, or (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

#' @rdname stan_biglm
#' @export
#' @param b A numeric vector of OLS coefficients, excluding the intercept

This file has been truncated. show original

arya · September 18, 2019, 3:12am

Thanks for linking that paper. R2-D2 looks pretty cool.

I guess the possible advantage over a Regularized Horseshoe prior would be that you get to specify a prior on R^2 instead of on the proportion of non-zero coefficients. They also mention in that paper that the prior density on the coefficients is unbounded at zero which leads to tighter shrinkage around zero.

Has anyone tried R2-D2 in Stan?

Topic		Replies	Views
The variance of columns of Q in QR decomposition of linear model in rstanarm General	3	605	January 7, 2020
Ridge prior with QR decomposition Modeling	4	940	January 8, 2019
QR decomposition and R2D2 Modeling	9	248	July 1, 2024
Trouble Implementing R2-D2 Prior in Rstan Modeling rstan , fitting-issues	4	733	September 13, 2022
Rstanarm prior specification: stan_glm.nb() and stan_glm() Poisson rstanarm	6	1159	June 21, 2017

Prior on R2

Related topics