Initializing Simplex / Dirichlet


I’ve been having problems initializing simplex variables. I’ve seen this hinted at throughout the discussion boards (old and new, e.g. However, it’s not obvious to me if there’s a fix.

Here is a minimum working example: test_simplex_init.R (885 Bytes)

Using this Stan model:

  int N;
  int V;

  int<lower=1,upper=V> x[N];
  simplex[V] nu;
  nu ~ dirichlet(rep_vector(1.0, V));
  for(n in 1:N)
    x[n] ~ categorical(nu);

I generate data in R, then test parameter recovery. I get the following strange behavior:

  • If I don’t initialize parameters, it works fine (i.e. parameters are recovered), regardless of N or V
  • If I initialize nu to be a rounded version of the truth (2 decimals), with some renormalization so that it sums to 1, I get different behavior depending on the value of V:
    • if V = 10, everything works as expected

    • if V = 100, I get:

      Rejecting initial value:
      Log probability evaluates to log(0), i.e. negative infinity.
      Stan can’t start sampling from this initial value.

I encountered similar behavior in a more complicated model, again when trying to initialize a simplex variable. However, in that case, I often get the message:

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: dirichlet_lpmf: probabilities is not a valid simplex. sum(probabilities) = nan, but should be 1

Is this a bug? Is there a workaround, besides for replacing the dirichlet with some unconstrained parameter and then pushing it through softmax?


I would suspect that round()'ing causes some elements to be exactly 0 or 1, and that the probability of this occurring increases with the number of elements in V. Otherwise I would suspect there is some weird floating point thingy going on (the sum of the round()ed vector is different enough from 1 for the initialiser to throw a wobbly). Is there some reason the built in initialisation routine is not working for you?

Thanks for your reply, you’re absolutely right about rounding inducing zeros, which yields the problem. I feel stupid for missing this.

This example was just meant to test initializing a simplex, which I am using in a more complex model. This complex model has several simple nested submodels. My intent is to use these simpler submodels to initialize parameters for the bigger model. One of those parameters is simplex valued.

Part of the reason I did the rounding is because I was worried about weird floating point thingies. In fact, I just assumed using the full precision values would be a problem without testing it, which in retrospect was a pretty dumb way to start. As it turns out, initializing using the fully precise values works fine.

Definitely don’t feel silly about this, it’s a subtle thing! - I only thought of this because I spent quite some time figuring out why this other example didn’t work.

Seems reasonable, I have done this before. You will loose the ability to diagnose a misbehaving model by its sensitivity to initial conditions, and the R-hat diagnostic doesn’t make as much sense. But sometimes it can be a way of getting sensible answers out. Good luck with it!