Unexpected error in running vb()

I have been facing this weird error on running vb() in RStan. My Stan model is as follows:


data {
  int<lower=1> K; // num topics
  int<lower=1> V; // num words
  int<lower=0> D; // num docs
  int<lower=0> n[D, V]; // word counts for each doc

  // hyperparameters
  vector<lower=0>[K] alpha;
  vector<lower=0>[V] gamma1;
  vector<lower=0>[V] gamma2;
  vector<lower=0, upper=1>[V] delta;
}
parameters {
  simplex[K] theta[D]; // topic mixtures
  vector<lower=0,upper=1>[V] zeta[K]; // zero-inflated betas
}


transformed parameters {
  vector<lower=0>[V] beta[K];
  for (k in 1:K) {
	beta[k,1] =  zeta[k,1];
  for (m in 2:V) {
    beta[k,m] = zeta[k,m]*prod(1 - zeta[k,1:(m - 1)]);  // stick breaking
  }
  }
  for (k in 1:K) {
      beta[k]=beta[k]/sum(beta[k,1:V]);  // GD construction
  }
}


model {
  for (d in 1:D) {
    theta[d] ~ dirichlet(alpha);  
  }

  for (k in 1:K) {
    for (m in 1:V) {
      if (zeta[k,m]==0){  // Zero-inflated beta likelihood
        target += bernoulli_lpmf(1 | delta[m]);
      }else{
        target += bernoulli_lpmf(0 | delta[m]) + beta_lpdf(zeta[k,m] | gamma1[m], gamma2[m]);
      }
		}
  }

  for (d in 1:D) {
    vector[V] eta;
    eta = beta[1] * theta[d, 1];
    for (k in 2:K) {
      eta = eta + beta[k] * theta[d, k];
    }
    eta = eta/sum(eta[1:V]);
    n[d] ~ multinomial(eta);  // generation of each sample
  }
}

Now, as I run this in R using the following input data:


zinLDA_stan_data <- list(K = 8, V = ncol(X),  D = nrow(X), n = X, alpha = rep(0.1, 8), gamma1 = rep(0.5, ncol(X)), gamma2 = rep(10,ncol(X)), delta = rep(0.52, ncol(X)))

stan.model <- stan_model(file = 'LGDAmodel.stan')
fit_zinLDA <- vb(stan.model, data=zinLDA_stan_data, algorithm="meanfield", iter=4000)

I encounter the following output:

Chain 1: ------------------------------------------------------------

Chain 1: EXPERIMENTAL ALGORITHM:

Chain 1: This procedure has not been thoroughly tested and may be unstable

Chain 1: or buggy. The interface is subject to change.

Chain 1: ------------------------------------------------------------

Chain 1:

Chain 1:

Chain 1:

Chain 1: Gradient evaluation took 0.045774 seconds

Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 457.74 seconds.

Chain 1: Adjust your expectations accordingly!

Chain 1:

Chain 1:

Chain 1: Begin eta adaptation.

Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)

Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)

Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)

Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)

Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)

Chain 1: Iteration: 250 / 250 [100%] (Adaptation)

Chain 1: Success! Found best value [eta = 0.1].

Chain 1:

Chain 1: Begin stochastic gradient ascent.

Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes

Chain 1: 100 -68268478.213 1.000 1.000

Chain 1: 200 -34975639.064 0.976 1.000

Chain 1: 300 -21238607.967 0.866 0.952

Chain 1: 400 -14591251.173 0.764 0.952

Chain 1: 500 -11176187.978 0.590 0.647

Chain 1: 600 -9322193.373 0.402 0.456

Chain 1: 700 -8336422.815 0.270 0.306

Chain 1: 800 -7794480.066 0.173 0.199

Chain 1: 900 -7492810.776 0.107 0.118

Chain 1: 1000 -7339750.566 0.062 0.070

Chain 1: 1100 -7250413.009 0.036 0.040

Chain 1: 1200 -7195109.876 0.020 0.021

Chain 1: 1300 -7156785.229 0.012 0.012

Chain 1: 1400 -7127091.559 0.007 0.008 MEAN ELBO CONVERGED MEDIAN ELBO CONVERGED

Chain 1:

Chain 1: Drawing a sample of size 1000 from the approximate posterior…

Chain 1: COMPLETED.

Error in if (p$diagnostics$pareto_k > 1) { :
missing value where TRUE/FALSE needed

I can’t make out what this error means and what exactly is going wrong. Is there a way to swallow this error and still provide the posterior estimates?

Re-upping this one. I have a model that’s causing the same exact error. Didn’t happen before with the same data/code, so my guess is that a subsequent software update is causing this.

This is probably the same bug in loo package as reported in Adding "loo" criterion with add_criterion and moment_match=TRUE is failing even when save_pars(all=TRUE) was set during model fit · Issue #222 · stan-dev/loo · GitHub and fixed in make Pareto k Inf if it is NA by topipa · Pull Request #224 · stan-dev/loo · GitHub. Can you test with installing the github version of loo package?

@jonah when is the next loo CRAN release planned?

1 Like

We can do one soon, but it would be good to know if the latest loo on GitHub fixes this issue before we submit it.

@adamramey If installing loo from GitHub doesn’t fix this then you could try using the variational method in CmdStanR, which runs the same algorithms as vb in RStan but shouldn’t have this error (I don’t think we’ve added this diagnostic check in CmdStanR yet).

Thanks for the help! So the CmdStanR worked well. The GitHub version of loo did not seem to help unfortunately.

1 Like