LDA for word proportions fit issue

nbhat · August 8, 2018, 11:34pm

I’m trying to fit a version of vanilla LDA that takes in the word distribution for each document. So I just take
\beta_k = topic distributions ~ Dir(\gamma)
\theta_d = per document mixture proportions ~ Dir(\alpha)
\xi_d = per document observed word distribution~ Dir(B \cdot \theta)
where B is the matrix of \beta_k as row vectors.

Here’s my stan code

data {
  int<lower=1> K; // num topics
  int<lower=1> V; // num words
  int<lower=0> D; // num docs
  simplex[V] xi[D]; // word proportions for each doc

  // hyperparameters
  vector<lower=0>[K] alpha;
  vector<lower=0>[V] gamma;


}

parameters {
  simplex[K] theta[D]; // topic mixtures
  simplex[V] beta[K]; // word dist for k^th topic
}

model {
  for (d in 1:D) {
    theta[d] ~ dirichlet(alpha);
  }

  for (k in 1:K) {
    beta[k] ~ dirichlet(gamma);
  }

  for (d in 1:D) {
    vector[V] eta;
    eta = beta[1] * theta[d, 1];

    for (k in 2:K) {
      eta = eta + beta[k] * theta[d, k];
    }
    xi[d] ~ dirichlet(eta);
  }

}
}

When I try to fit this model, I get the usual Stan error

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Error in sampler$call_sampler(c(args, dotlist)) : Initialization failed.

I must have missed a constraint, or maybe the model is flawed. Any help is greatly appreciated!

bgoodri · August 9, 2018, 12:35am

specify init_r to some value closer to 0 than the default of 2

nbhat · August 9, 2018, 12:44am

I tried 1, 0.1, 0.01 for init_r and I get the same error unfortunately.

I don’t think it should matter, but my initial proportion vectors \xi_d are pretty sparse – they are length 2582 with roughly 80 percent zeros.

bgoodri · August 9, 2018, 1:32am

Yes, that could matter. What are you using for alpha and gamma?

nbhat · August 9, 2018, 1:37am

So far I’ve been fitting the model with

\alpha and \gamma uniform
K = 4
V = 2582
D = 54

EDIT: Also, I forgot a crucial piece of information. I’ve been fitting this with variational Bayes so far.

bgoodri · August 9, 2018, 1:43am

All bets are off then

nbhat · August 9, 2018, 2:10am

I switched to standard Stan fit and printed the log posterior before and after each draw from a distribution. It seems the \xi_d are the issues. When I modify the last loop to be

for (d in 1:D) {
    vector[V] eta;
    eta = beta[1] * theta[d, 1];

    for (k in 2:K) {
      eta = eta + beta[k] * theta[d, k];
    }

    print("log-posterior = ", get_lp())
    xi[d] ~ dirichlet(eta);
    print("log-posterior = ", get_lp())
  }

I find the log posterior is finite before the first \xi assignment and infinite after. It’s unclear to me whether this is because of poor initialization of the parameters or because the \xi data is highly sparse.

bgoodri · August 9, 2018, 2:34am

Try printing eta and xi[d] as well.

Bob_Carpenter · August 12, 2018, 11:56am

The code can be simplified to:

vector[V] eta = sum(beta .* theta[d]);

What’s making sure that eta > 0?

Topic		Replies	Views
Runtime error for multinomial mixture model Modeling	2	800	January 9, 2018
LDA: topics do not separate Modeling	3	606	January 3, 2020
Gaussian LDA Modeling fitting-issues	18	3241	August 30, 2017
Running LDA in Stan Modeling cmdstan , techniques , specification , example-models	2	421	July 12, 2023
Help with Stan modeling using Mixture model Modeling rstan	1	111	July 5, 2024

LDA for word proportions fit issue

Related topics