"generated quantities" does not work

Hi!
I am using “rstan” and exercising with a multiple regression model.
I found that when I included the “generated quantities” in the Stan code, an error appeared.
Here is the code.

set.seed(123)
rm(list=ls())
library(rstan)

N = 250
K = 3
covariates = replicate(K, rnorm(n=N))
colnames(covariates) = c('X1','X2','X3')
X = cbind(Intercept=1, covariates)
coefs = c(5, .2, -1.5, .9)
mu = X %*% coefs
sigma = 2
y = rnorm(N, mu, sigma)
dat = list(N=N, K=ncol(X), y=y, X=X)


stanmodelcode = "

data {                      
  int<lower=1> N;
  int<lower=1> K;
  matrix[N, K] X;
  vector[N] y;
}

parameters {   
  vector[K] beta;     
  real<lower=0> sigma;
}

model { 
  vector[N] mu;
  mu = X * beta;
  beta ~ normal(0, 10);
  sigma ~ cauchy(0, 5);
  y ~ normal(mu, sigma);
}

"

stanmodelcodeRsqr = "

data {                  
  int<lower=1> N;
  int<lower=1> K;
  matrix[N, K] X;
  vector[N] y;
}

parameters {               
  vector[K] beta;
  real<lower=0> sigma;
}

model {                     
  vector[N] mu;
  mu = X * beta;    
  beta ~ normal(0, 10);
  sigma ~ cauchy(0, 5);
  y ~ normal(mu, sigma);
}

generated quantities {
  real rss;                
  real totalss;
  real<lower=0, upper=1> R2;                 
  vector[N] mu;
  
  mu = X * beta;
  rss = dot_self(y-mu);
  totalss = dot_self(y-mean(y));
  R2 = 1 - rss/totalss;
}

"

fitmodel = stan(model_code = stanmodelcode,
              data = dat,
              iter = 5000,
              warmup = 2500,
              thin = 10,
              chains = 1)

fitRsq = stan(model_code = stanmodelcodeRsqr,
              data = dat,
              iter = 5000,
              warmup = 2500,
              thin = 10,
              chains = 1)

When I run this code, “fitmodel” runs successfully, but “fitRsq” does not with the error message:

[1] "Chain 1: Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                                                                                             
[2] "Chain 1: Exception: normal_lpdf: Scale parameter is 0, but must be > 0!  (in 'model36661a4a4b97_6a85a8dbcc29774478602cd182a22e81' at line 20)"                                                                       
[3] "Chain 1: If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"                                                                     
[4] "Chain 1: but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                                                                                                   
[5] "Chain 1: "                                                                                                                                                                                                           
[6] "Error in sampler$call_sampler(args_list[[i]]) : "                                                                                                                                                                    
[7] "  Exception: model36661a4a4b97_6a85a8dbcc29774478602cd182a22e81_namespace::write_array: R2 is -1.99745, but must be greater than or equal to 0  (in 'model36661a4a4b97_6a85a8dbcc29774478602cd182a22e81' at line 26)"
error occurred during calling the sampler; sampling not done

Do you have any suggestions?

add]
I found that the error disappeared when I change the boundary of R2 from
real<lower=0, upper=1> R2;
to
real R2.
But cannot I constrain the range of the generated quantities?

1 Like

For generated quantities, constraints are enforced after the statements in the generated quantities block have executed. Thus, your problem is presumably that 1 - rss/totalss doesn’t respect the constraint that you placed on R2. What is the desired behavior if 1 - rss/totalss isn’t on the unit interval?

3 Likes

Hi Jacob!
Thanks for the reply.
But it is hard to understand “for generated quantities, constraints are enforced after the statements in the generated quantities block have executed.” I imposed the unit range constraint to confine R2 within the range if R2 is out of the range “after the generated quantities block has been executed.” Can you explain more detail?

1 Like

In generated quantities (and also in data), the values are defined first. Then Stan checks whether the values respect the constraints. If the values do respect the constraints, then there’s no problem. If the values don’t respect the constraints, then Stan errors.

The generated quantities block does not influence the model priors or likelihood in any way–it’s only for computing functions of parameters and data. It doesn’t feed back to influence the parameter estimates whatsoever.

Your quantity R2 will not be in the unit interval if Stan ever explores (even early in warmup) a region of parameter space where the regression fits worse (as assessed by sum of squared residuals) than the maximum-likelihood estimate for an intercept-only model. Note that both rss and totalss should always be positive, so there should be no trouble in respecting the upper bound of 1.

I still don’t understand what exactly you intend to achieve by placing the constraint on R2. The main reason for declaring generated quantities with constraints is to ensure that there aren’t bugs elsewhere in the model. If there is a generated quantity that you believe, based on the model configuration, must necessarily respect some constraint, you can declare that constraint just to ensure that your Stan program will complain loudly if there’s a bug that leads to different behavior. If that’s your goal, then for what it’s worth it does not look to me like you have a bug, but rather it looks like you’ve overlooked the possibility that rss/totalss can indeed be greater than 1 even if the model is behaving correctly. On the other hand, if you instead intend to completely discount all regions of parameter space where 1 - rss/totalss is less than zero, then you need to build this constrain into the model itself, as defined in your parameters and model blocks. In general, I’d advise strongly against trying to do this, unless you really know what you’re doing and have a really good reason for doing it.

4 Likes