Questions using "targets" in model part in stan

I am newbie to stan. To me, it seems that to write out a model in Stan, it doesn’t require a likelihood in the model part. But, I get confused why in the stan manual, mixture model will have a likelihood component in the model part. But other regression type of model just have a generative part. I feel I am misunderstanding something.

For example, in the manual, in the LDA model, why do we need targets instead of just gamma[k]??

data {
  int<lower=2> K;               // num topics
  int<lower=2> V;               // num words
  int<lower=1> M;               // num docs
  int<lower=1> N;               // total word instances
  int<lower=1,upper=V> w[N];    // word n
  int<lower=1,upper=M> doc[N];  // doc ID for word n
  vector<lower=0>[K] alpha;     // topic prior
  vector<lower=0>[V] beta;      // word prior
}
parameters {
  simplex[K] theta[M];   // topic dist for doc m
  simplex[V] phi[K];     // word dist for topic k
}
model {
  for (m in 1:M)
    theta[m] ~ dirichlet(alpha);  // prior
  for (k in 1:K)
    phi[k] ~ dirichlet(beta);     // prior
  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K)
      gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
    target += log_sum_exp(gamma);  // likelihood;
  }
}

Also, in general regression, why there is no such a thing as targets in the model? For example, a hierarchical regression model:

data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  int<lower=1> L;              // num group predictors
  int<lower=1,upper=J> jj[N];  // group for individual
  matrix[N, K] x;               // individual predictors
  row_vector[L] u[J];          // group predictors
  vector[N] y;                 // outcomes
}
parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;           // group coeffs
  vector[K] beta[J];           // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}
model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    row_vector[K] u_gamma[J];
    for (j in 1:J)
      u_gamma[j] = u[j] * gamma;
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N)
    y[n] ~ normal(x[n] * beta[jj[n]], sigma);
}

The statements of the form

variable ~ distribution()

are just another way of incrementing the target. Quoting from the Reference manual:

y ~ normal(mu,sigma);
The name “sampling statement” is meant to be suggestive, not interpreted literally.Conceptually, the variabley, which may be an unknown parameter or known, modeled data, is being declared to have the distribution indicated by the right-hand side of the sampling statement. Executing such a statement does not perform any sampling. In Stan, a sampling statement is merely a notational convenience. The above sampling statement could be expressed as a direct increment on the total log probability as
target += normal_lpdf(y | mu, sigma);

So

y ~ normal(mu,sigma);

and

 target += normal_lpdf(y | mu, sigma);

mean the same thing.

3 Likes

Oh I see, so there are implicitly already a target function there, so for the LDA model below (to make sure that I understand correctly):

model {
  for (m in 1:M)
    theta[m] ~ dirichlet(alpha);  // prior
  for (k in 1:K)
    phi[k] ~ dirichlet(beta);     // prior
  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K)
      gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
    target += log_sum_exp(gamma);  // likelihood;
  }
}

It means that the phi, theta, are already added to the model to the target and then the target is further incremented by the gamma?

Does this mean if I want to model an outcome variable that has the depends on the topic distribution I can also put it in the model (e.g., the word assignment is conditionally independent of the outcome, and the outcome depends on the document topic proportion).

model {
  for (m in 1:M)
    theta[m] ~ dirichlet(alpha);  // prior
  for (k in 1:K)
    phi[k] ~ dirichlet(beta);     // prior
  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K)
      gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
    target += log_sum_exp(gamma);  // likelihood;
  }
    for(m in 1:M){
          y[m] ~  bernoulli_logit(a + lambda * theta[m]); // a and lambda declared in the parameter part
  }
}

Is this also a valid model? So, basically, for Stan, we just need to write out the generative story and get the conditional independence assumptions sorted out… and put that in the model part, it is all set??

Yes, if I understand you correctly. Remember that
Posterior \propto likelihood * prior
and Stan works with log probability, so the multiplication becomes addition.

For your second question about modeling the outcome, I am less sure of the answer. You are incrementing the log probability by both the result of the log_sum_exp(gamma) and the bernoulli_logit(), so the posterior probability is the product of those two. You are not getting the probability of just the word assignment or just the outcome, but of both together. Is that what you want? And keep in mind, I am fairly new to Stan also. Do not take what I say on this point as highly reliable. If you are not sure how to reason about this, it might be good to start a separate topic.

2 Likes