Running LDA in Stan

sam1 · June 21, 2023, 7:47pm

Hello Everyone,

I am trying to run LDA using cmdstan, to do this, I am facing two problems:
1- what is the format of the data that is accepted by Stan to perform LDA. In other words how the input should be?
2- After running stan, I believe that the output will be samples from the posterior. The goal of LDA is to assign a topic for each word. How to move obtain topics for each word from those posterior samples.

The model I am using is the one presented in the documentation:

data {
  int<lower=2> K;               // num topics
  int<lower=2> V;               // num words
  int<lower=1> M;               // num docs
  int<lower=1> N;               // total word instances
  int<lower=1,upper=V> w[N];    // word n
  int<lower=1,upper=M> doc[N];  // doc ID for word n
  vector<lower=0>[K] alpha;     // topic prior
  vector<lower=0>[V] beta;      // word prior
}
parameters {
  simplex[K] theta[M];   // topic dist for doc m
  simplex[V] phi[K];     // word dist for topic k
}
model {
  for (m in 1:M)
    theta[m] ~ dirichlet(alpha);  // prior
  for (k in 1:K)
    phi[k] ~ dirichlet(beta);     // prior
  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K)
      gamma[k] = log(theta[doc[n], k]) + log(phi[k, w[n]]);
    target += log_sum_exp(gamma);  // likelihood;
  }
}

mitzimorris · July 6, 2023, 11:04am

have you seen this? https://www.mithilaguha.com/post/lda-model-simulated-data-generation-in-r-parameter-recovery-study-in-rstan

sam1 · July 12, 2023, 2:00pm

Yes, I did, and they are using the mean to get a point estimate of the topic, which does not seem right to me.

Topic		Replies	Views
LDA: topics do not separate Modeling	3	606	January 3, 2020
Implementing the LDA example from the user guide with rstan Modeling	1	1295	July 9, 2019
LDA Tutorial likelihood calculation Modeling pystan , math	1	414	March 2, 2021
LDA for word proportions fit issue Modeling fitting-issues	8	822	August 12, 2018
Topic model with an outcome variable Modeling	1	497	January 3, 2020

Running LDA in Stan

Related topics