The standard LDA example given in the STAN manual assumes purely a bernoulli presence/absence response. I’m trying to edit this example for a binomial response where I have a proportion given by the number of species | number of total species hits.
I have two more variables added to the data for this response: the numerator (specieshits[N]) and the denominator (tothits[M]). I cannot figure out how to modify gamma in order to incorporate this type of response instead of a purely binary one.
data {
int<lower=2> K; // num communities
int<lower=2> V; // num species
int<lower=1> M; // num sites
int<lower=1> N; // total species instances
int<lower=1,upper=V> w[N]; // species n
int<lower=1,upper=M> doc[N]; // site ID for species n
int specieshits[N]; //number of species hits at site M for species n
int<lower=1> tothits[M]; // total number of all species hits at site M
vector<lower=0>[K] alpha; // community prior
vector<lower=0>[V] beta; // species prior
}
parameters {
simplex[K] theta[M]; // community dist for site m
simplex[V] phi[K]; // species dist for community k
}
model {
for (m in 1:M)
theta[m] ~ dirichlet(alpha); // prior, proportion of each community at each site
for (k in 1:K)
phi[k] ~ dirichlet(beta); // prior, proportion of each species within each community
for (n in 1:N) {
real gamma[K];
for (k in 1:K)
gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
target += log_sum_exp(gamma); // likelihood;
}
}