Understanding a simple brms model using the `make_stancode` function

Please also provide the following information in addition to your question:

  • Operating System: MacOS
  • brms Version: brms_2.4.0

I have the following simple model:

# simple model = regress yield against temperature
string_stan <- make_stancode(yield ~ temperature, data=random_data, family=gaussian)
stan_data <- make_standata(yield ~ temperature, data= random_data, family=gaussian)

# write string to seperate .stan file
fileConn<-file("crop_model3.stan")
writeLines(string_stan_, fileConn)
close(fileConn)

Which produces the following Stan Code:

// generated with brms 2.4.0 - crop_model3.stan
// SIMPLE - Regress crop yield against temperature 
functions { 
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
} 
transformed parameters { 
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 1, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma);
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
} 

My question is about this produced code.

What is the prior on Sigma? Why does it have two distributions, one subtracted from the other?

  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 

Why - 1 * student_t_lccdf(0 | 3, 0, 10);?

What is _lccdf(...) in this context? In my understanding it is a ‘survival function’ of 1 - student_t_lccdf(). Why is that relevant here?

How would I write this in the syntax using the tilde (~)? If I were writing the stancode myself I might look for something like:

sigma ~ student_t(3,0,10)

I have read the ?brmsformula page & brms: An R Package for Bayesian Multilevel Models using Stan but did not find an answer that I could understand in there.

Thank you for the help! The tools brms and Stan are awesome!

Tommy

1 Like
1 Like

Thanks Ben! I see that someone has already asked that exact question. Thank you for pointing me in the right direction. Like @willte I only partly understand what that mean but I take your point that it’s not worth the time to understand it unless immediately necessary.

Is there another way to write the same functionality? Or does it have to be coded in this way in order to work? Just in case seeing it written in a different form sparks some insight!

It will “work” without the -student_t_lccdf(0 | 3, 0, 10) because student_t_lccdf(0 | 3, 0, 10) \approx -0.6931472 is a constant and subtracting a constant from target does not change which values of the parameters are proposed or accepted during the Markov Chain. It will not “work” for anything you do afterward that presumes such constants have been preserved, such as calculating Bayes Factors.

1 Like

Hi sorry, I have a question, what does this line mean? “int prior_only; // should the likelihood be ignored?”, I am confused, because apparently in our data we do NOT have a variable called ‘prior_only’

I’m pretty sure that brms is using that to sample from the prior, so it will take care of that variable implicitly.

Hi Bob,
thanks for your reply. I wonder if I write my own stan code, I don’t need this line “int prior_only”, right?