Which Input for Bayesian R Squared when using STAN (not RSTANARM)

Dear All,

I am checking a hierarchical model for predictive power, and want to compare the variance explained by two models, one with weakly informative priors. i am using the bayes_R2() function from the rstanarm package. Now my STAN file produces generated quantities with a Laplace (double exponential) likelihood. The position parameter is calculated as a time series process with controls. Now my question is, should I calculate the Bayesian R² taking the predicted means (“y_hat”) or the predicted values (“y3”) as y_pred?

I attach the STAN file for the model with uninformative priors, since it is shorter.

Thank you all so much in advance.

data {
  int<lower=1> N; // number of observations
  int<lower=1> J; // number of industries
  int<lower=1> T; //number of periods
  int<lower=1,upper=J> industry[N];
  int<lower=1,upper=T> year[N];
  vector[N] x; //RKT
  vector[N] y; //IFTC
  vector<lower=-0.5>[N-J] y2; //IFTC for all but first period
  vector[N] kl;
  vector[N] go;
  vector[N] pg;
}
parameters {
  vector[J] alpha; // intercept
  real beta; // autoregressive element
  vector[2] gamma; //slope parameters for RKT, RKT(-1)
  real delta;
  real eta;
  real mu_alpha;
  real mu_beta;
  real mu_gamma;
  real mu_delta;
  real mu_eta;
  real<lower=0> sigma; //variance of observation
  real<lower=0> sigma_alpha; //variance of intercept
  real<lower=0> sigma_beta; //variance of slope
  real<lower=0> sigma_gamma;
  real <lower=0> sigma_delta;
  real <lower=0> sigma_eta;
  real theta; //moving average coefficient
  real mu_theta;
  real<lower=0> sigma_theta;
}


model {
  vector[N - J] y_hat; // predictions
  vector[N] err; //error for time t
  int pos; //position in raw dataset
  int local_pos; //continuous index for y_hat only
  pos = 0;
  local_pos = 0;
  
  for(j in 1:J){
    err[(j-1)*T+1] = 0; //set the first error for each average equal 0, since no estimation is calculated
    for(t in 2:T){
      local_pos = local_pos + 1;
      pos = (j-1)*T + t; //number of time periods per group + current time period 
      y_hat[local_pos] = alpha[industry[pos]] + y[pos-1] * beta + x[pos] * gamma[1] + x[pos-1] * gamma[2] + err[pos - 1] * theta + log(kl[pos]) * delta + log(go[pos]) * eta;
      //+ log(pg[pos]) * iota;
      err[pos] = y2[local_pos] - y_hat[local_pos];
      
    }
  }
  
  alpha ~ normal(mu_alpha, sigma_alpha);
  beta ~ normal(mu_beta, sigma_beta);
  gamma ~ normal(mu_gamma, sigma_gamma);
  sigma ~ cauchy(0, 2.5);
  delta ~ normal(mu_delta, sigma_delta);
  eta ~ normal(mu_eta, sigma_eta);
  theta ~ normal(mu_theta, sigma_theta);
  mu_alpha ~ normal(0, 0.09);
  sigma_alpha ~ cauchy(0, 25);
  mu_beta ~ normal(0, 0.09);
  sigma_beta ~ cauchy(0, 25);
  mu_gamma ~ normal(0, 0.09);
  sigma_gamma ~ cauchy(0, 25);
  mu_delta ~ normal(0,0.09);
  sigma_delta ~cauchy(0, 25);
  mu_eta ~ normal(0, 1);
  sigma_eta ~ cauchy(0, 25);
  mu_theta ~ normal(0, 0.09);
  sigma_theta ~ cauchy(0, 25);
  
  y2 ~ double_exponential(y_hat, sigma);
}
generated quantities{
  vector[N - J] y_hat; // predictions
  vector[N] err; //error for time t
  vector[N - J] y3; //predictions
  int pos; //position in raw dataset
  int local_pos; //continuous index for y_hat only
  pos = 0;
  local_pos = 0;
  
  for(j in 1:J){
    err[(j-1)*T+1] = 0; //set the first error for each average equal 0, since no estimation is calculated
    for(t in 2:T){
      local_pos = local_pos + 1;
      pos = (j-1)*T + t; //number of time periods per group + current time period 
      y_hat[local_pos] = alpha[industry[pos]] + y[pos-1] * beta + x[pos] * gamma[1] + x[pos-1] * gamma[2] + err[pos - 1] * theta + log(kl[pos]) * delta + log(go[pos]) * eta;
      //+ log(pg[pos]) * iota;
      err[pos] = y2[local_pos] - y_hat[local_pos];
    }
  }
  
  for(m in 1:(N-J))
    y3[m] = double_exponential_rng(y_hat[m], sigma);
}
} 

instead of

model{
vector[N] mu = alpha+beta*x;
y~normal(mu,sigma);
}


To include mathematical notation in your post put LaTeX syntax between two $ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

Hi,

you might want to check out this paper (specifically page 7). There the linear predictor is used (with inverse link function). In your case this would be \hat{y}, the predicted mean, if I’m not mistaken.

You probably know that the Bayesian R^2 only really makes sense in the linear regression case, the more you deviate from normality, the less “useful” R^2 becomes. The main advantage is that it’s realitvely cheap to compute the Bayesian R^2 and people are somewhat familiar with it (at least conceptually). The paper talks about that a bit.

Cheers!

Thank you!