What is Stan doing behind-the-scene at each line/block

Dear All,

I have been a Stan user for sometime now. However, in a recent project, I found myself struggling to pin down what is going on exactly behind the code. For example, consider the following model block, which models a two-stage model. My questions are in the comments below.

model{
  // stage one
  real mu1[n1];
  real mu2[n2];
  // stage two
  real z_hat[N];
  real zinit;
  
  beta1 ~ multi_normal(bMu[1], bSig[1]);
  beta2 ~ multi_normal(bMu[2], bSig[2]);
  gamma ~ multi_normal(gMu, gSig);

  //stage one:
  for(i in 1:n1){
    mu1[i] = beta1[1] + beta1[2]*logd1[i];
    y1[i] ~ scaled_logit_normal_lpdf(mu1[i], std1, ymax1);
  }
  for(i in 1:n2){
    mu2[i] = beta2[1] + beta2[2]*logd2[i];
    y2[i] ~ scaled_logit_normal_lpdf(mu2[i], std2, ymax2);
  }
  
  //stage two:
  for(i in 1:N){
    zinit = log(y[i]/(ymax1-y[i]));

  // Question 1: is it the posteriors of beta1 and beta2 that are used in stage 2 sampling??
    z_hat[i] = newton_method(zinit, beta1, beta2, gamma, x_r[i]);

  // Question 2: are the gamma's updated through z_hat[i]?
    y[i] ~ scaled_logit_normal_lpdf(z_hat[i], sigma, ymax1);
  }
}

For Question 1, to my understanding, in stage 1, when mu1[i] = beta1[1] + beta1[2]*logd1[i]; it means Stan has drawn values for beta1 from the prior and assign the computed value to mu1[i].
Then when y1[i] ~ scaled_logit_normal_lpdf(mu1[i], std1, ymax1); it means that Stan is drawing from the joint posterior distribution of the beta1, beta2, std1, and ymax1. Correct? If so, then in stage 2, when beta1 and beta2 appear in newton_method(), Stan is drawing gamma’s from its prior, conditioned on the posterior of beta1 and beta2 to solve for z_hat, right?

For Question 2, in y[i] ~ scaled_logit_normal_lpdf(z_hat[i], sigma, ymax1);, Stan is drawing samples from the joint posterior of gamma and sigma conditioned on all of the stage 1 parameters. If that is correct, then gamma is getting updated through z_hat?

(Note: newton_method() is a customized newton’s solver to solve for the mean (i.e. z_hat). And scale_logit_normal_lpdf() is a customized function to sample the logit transform of y from a normal distribution.)

Can anybody please tell me if my understanding is correct? If not, can you help me understand what Stan is doing behind each line of these code conceptually?

Thanks in advance for your help!

1 Like

That is not correct. Within each iteration, Stan has obtained many candidates for beta1. When doing MCMC, the probability than each of those candidates is accepted as the realization from the posterior distribution on that iteration depends on the logarithm of the numerator of Bayes Rule (possibly excluding some constants that do not depend on any parameters). When you do

beta1 ~ multi_normal(bMu[1], bSig[1]);

that is equivalent to adding the logarithm of
image
to the accumulated log-numerator of Bayes Rule, where \mathbf{x} is beta1, \boldsymbol{\mu} is bMu[1], and \boldsymbol{\Sigma} is bSig[1]. It is the essentially the same for all the other lines with a ~ in them; they all add a term to the logarithm of the numerator of Bayes Rule.

But, you are correct that for each beta1 candidate, it assigns beta1[1] + beta1[2] * logd1[i] to mu[i], so mu is really just a placeholder for the right-hand side of that expression rather than an autonomous parameter. Stan uses mu1 and everything else to evaluate the logarithm of the numerator of Bayes Rule to probabilistically select candidates jointly for the parameters conditional on the data. But the whole model block is evaluated to make that determination. You can think of the model block as a function of the parameters that returns the logarithm of the numerator of Bayes Rule.

If you are using the Newton optimizer (which is not really recommended), then you just obtain the posterior mode. But it is using the gradient of the logarithm of Bayes Rule to define the sequence of Newton steps that leads to the mode.

2 Likes

Thanks, Ben :)

So in other words, there is no such thing as “sampling from priors or posteriors”, it is more of having a bunch of candidates and determine who are in the joint posterior by evaluating the logarithm of the numerator of Bayes Rule. I think I get that.

But for question 2, what I intend to do is to obtain the joint posterior of \Gamma and \sigma, given

y \sim scaled-\mathcal{N}(f(\Gamma|\mathbb{\beta}_1, \mathbb{\beta}_2, y_{max}, std_1, std_2), \sigma),

where f(\Gamma|...) is an implicit function of y[i]. So I had to solve for it via the newton_method (it could be some other solver, but this one works so far…) and the root becomes z_hat[i]. Then to obtain the joint posterior of \Gamma and \sigma, I did y[i] ~ scaled_normal_lpdf(...). If that only gives me the posterior mode (for \Gamma?), then what should I do in order to obtain the joint posterior of \Gamma and \sigma?

Thanks again!

Well, the selected candidates are draws from the posterior distribution, but the model block is just deterministically evaluating the log-density of the candidates.

With implicit functions, it would probably be better to use the algebra_solver function rather than writing your own Netwon solver, but the principle is the same. There are some other unknowns that are roots of an implicit function, which can be solved for given (candidates for) whatever you declare in the parameters block. If a candidate for the parameters gets accepted on some iteration, then the corresponding root of the implicit function is also computed. If you solve the implicit function in the transformed parameters block rather than the model block, the root will be stored in the output along with everything else.

1 Like

I see. I tried the algebra_solver() function, but didn’t have much luck with it. (Please see Precise but wrong results for a drug interaction model for more details).

Just making sure I understand your reply: since the log-density of the candidates are evaluated in the model block and the candidates are accepted/rejected based on the evaluation, the results I got on gamma and sigma are their joint posterior not just the mode, correct?

That is correct if you use the MCMC algorithm.

Got you! Thanks!