I am newbie to stan. To me, it seems that to write out a model in Stan, it doesn’t require a likelihood in the model part. But, I get confused why in the stan manual, mixture model will have a likelihood component in the model part. But other regression type of model just have a generative part. I feel I am misunderstanding something.
For example, in the manual, in the LDA model, why do we need targets instead of just gamma[k]??
data {
int<lower=2> K; // num topics
int<lower=2> V; // num words
int<lower=1> M; // num docs
int<lower=1> N; // total word instances
int<lower=1,upper=V> w[N]; // word n
int<lower=1,upper=M> doc[N]; // doc ID for word n
vector<lower=0>[K] alpha; // topic prior
vector<lower=0>[V] beta; // word prior
}
parameters {
simplex[K] theta[M]; // topic dist for doc m
simplex[V] phi[K]; // word dist for topic k
}
model {
for (m in 1:M)
theta[m] ~ dirichlet(alpha); // prior
for (k in 1:K)
phi[k] ~ dirichlet(beta); // prior
for (n in 1:N) {
real gamma[K];
for (k in 1:K)
gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
target += log_sum_exp(gamma); // likelihood;
}
}
Also, in general regression, why there is no such a thing as targets in the model? For example, a hierarchical regression model:
data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
int<lower=1> L; // num group predictors
int<lower=1,upper=J> jj[N]; // group for individual
matrix[N, K] x; // individual predictors
row_vector[L] u[J]; // group predictors
vector[N] y; // outcomes
}
parameters {
corr_matrix[K] Omega; // prior correlation
vector<lower=0>[K] tau; // prior scale
matrix[L, K] gamma; // group coeffs
vector[K] beta[J]; // indiv coeffs by group
real<lower=0> sigma; // prediction error scale
}
model {
tau ~ cauchy(0, 2.5);
Omega ~ lkj_corr(2);
to_vector(gamma) ~ normal(0, 5);
{
row_vector[K] u_gamma[J];
for (j in 1:J)
u_gamma[j] = u[j] * gamma;
beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
}
for (n in 1:N)
y[n] ~ normal(x[n] * beta[jj[n]], sigma);
}
are just another way of incrementing the target. Quoting from the Reference manual:
y ~ normal(mu,sigma);
The name “sampling statement” is meant to be suggestive, not interpreted literally.Conceptually, the variabley, which may be an unknown parameter or known, modeled data, is being declared to have the distribution indicated by the right-hand side of the sampling statement. Executing such a statement does not perform any sampling. In Stan, a sampling statement is merely a notational convenience. The above sampling statement could be expressed as a direct increment on the total log probability as
target += normal_lpdf(y | mu, sigma);
Oh I see, so there are implicitly already a target function there, so for the LDA model below (to make sure that I understand correctly):
model {
for (m in 1:M)
theta[m] ~ dirichlet(alpha); // prior
for (k in 1:K)
phi[k] ~ dirichlet(beta); // prior
for (n in 1:N) {
real gamma[K];
for (k in 1:K)
gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
target += log_sum_exp(gamma); // likelihood;
}
}
It means that the phi, theta, are already added to the model to the target and then the target is further incremented by the gamma?
Does this mean if I want to model an outcome variable that has the depends on the topic distribution I can also put it in the model (e.g., the word assignment is conditionally independent of the outcome, and the outcome depends on the document topic proportion).
model {
for (m in 1:M)
theta[m] ~ dirichlet(alpha); // prior
for (k in 1:K)
phi[k] ~ dirichlet(beta); // prior
for (n in 1:N) {
real gamma[K];
for (k in 1:K)
gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
target += log_sum_exp(gamma); // likelihood;
}
for(m in 1:M){
y[m] ~ bernoulli_logit(a + lambda * theta[m]); // a and lambda declared in the parameter part
}
}
Is this also a valid model? So, basically, for Stan, we just need to write out the generative story and get the conditional independence assumptions sorted out… and put that in the model part, it is all set??
Yes, if I understand you correctly. Remember that Posterior \propto likelihood * prior
and Stan works with log probability, so the multiplication becomes addition.
For your second question about modeling the outcome, I am less sure of the answer. You are incrementing the log probability by both the result of the log_sum_exp(gamma) and the bernoulli_logit(), so the posterior probability is the product of those two. You are not getting the probability of just the word assignment or just the outcome, but of both together. Is that what you want? And keep in mind, I am fairly new to Stan also. Do not take what I say on this point as highly reliable. If you are not sure how to reason about this, it might be good to start a separate topic.