Trying to add a covariate in LDA topic model

I want to include a covariate into a LDA model with meta for each documents. For example, the age of the person writing the document.

The generative process I am thinking about is:
for each topic k:
draw \lambda_k \sim Normal(0,1)
draw \phi_k \sim Dir(\beta)
for each document d:
for each topic k, \alpha_{dk} = exp(X_d^T \lambda_k)
draw \theta_d \sim Dir(\alpha_d)
for each word i,
draw z_i \sim Cat(\theta_d)
draw w_i \sim Cat(\phi_{z_i})

This is a simplified version of ( Topic Models Conditioned on Arbitrary Features) https://mimno.infosci.cornell.edu/papers/dmr-uai.pdf
My code is below (my model runs, but I feel I am putting the \theta_m variable in the wrong place). I want to know if people in different ages speaks differently. So, I am thinking the model written this way seems fine. But I do not know where to declare \theta, if I declare it in the model part, Stan gives me an error saying that there is no simplex. But if I put in the parameters part, then I will have both \theta and \lambda. This does not look right to me. Would greatly appreciate any insight!!!

data {
  int<lower=2> K;               // num topics
  int<lower=2> V;               // num words
  int<lower=1> M;               // num docs
  int<lower=1> N;               // total word instances
  int<lower=1,upper=V> w[N];    // word n
  int<lower=1,upper=M> doc[N];  // doc ID for word n
  vector<lower=1>[M] G;         // gender information
 // vector<lower=0>[K] alpha;     // topic prior
  vector<lower=0>[V] beta;      // word prior
  real sigma;                   // simplify
}
parameters {
  simplex[V] phi[K];     // word dist for topic k  :827
  simplex[K] theta[M];  // topic dist
  real lambda[K]; // !!! not sure here what happens if have both parameters
}
model {
  for (k in 1:K){
    lambda[k] ~ normal(0, sigma); // here only gender info, so it is a scalor
    phi[k] ~ dirichlet(beta);    
  }
  
  for (m in 1:M){
    vector[K] alpha;
    for (k in 1:K){
       alpha[k] = exp(G[m] * lambda[k]);
    }
    theta[m] ~ dirichlet(alpha);  
  }
  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K)
      gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
    target += log_sum_exp(gamma);  
    }
}
 

Hi,
sorry for taking so long to respond. Also, I have no background in LDA models, so below are just my few guesses.

I am not sure I understand your concerns. Your data generating process involves both \theta and \lambda as parameters, so your model should generally do so as well. Such a model should work, but might be slower if d is large. You could however probably simplify the model and make it more efficient by integrating out \theta - if I read your model correctly, you should have z_i \sim \text{DirichletMultinomial} (\alpha_d).

However, please check my reasoning there is a bunch of stuff I do not understand about your model, for example, what is the role of z_i - it doesn’t seem to be handled in the model. Is it observable? If it is unknown you cannot directly handle it in Stan as parameter since it is dicsrete, but instead you would need to marginalize it out which might be tricky and very slow.

Hope that helps!