Logistic-normal-multinomial model RuntimeError: Initialization failed

Greetings, everybody. Trying to model a mixture of exponential growth processes using a logistic-normal-multinomial model. I have a working implementation in PyMC3, but the sampling is painfully slow, which is why I’m looking forward to reimplementing it in PyStan.

data {
    int N;  // N samples
    int D;  // D components
    vector[N] t;  // time
    matrix[D-1, D] Psi;  // ILR contrast matrix (basis)
    int counts[N, D];  // Observed counts
}
parameters {
    vector[D] lambda;  // exponential growth rates
    vector[D-1] alpha;  // intercept in ILR-coordinates
    vector[D-1] log_sigma;  // log of StDev in ILR-coordinates
}
transformed parameters {
    vector[D-1] sigma = exp(log_sigma);
}
model {
    vector[D-1] mu;  // expectation for ILR coordinates
    vector[D-1] coord;  // ILR coordinates
    vector[D] prob;  // a vector of probabilities for the multinomial 
    for (n in 1:N){
        mu = Psi * log(lambda+1) * t[n] + alpha;
        coord ~ normal(mu, sigma);
        prob = exp(Psi' * coord) / sum(exp(Psi' * coord));
        counts[n,] ~ multinomial(prob);
    }
    lambda ~ beta(4, 1);
    alpha ~ normal(0, 1.5);
    log_sigma ~ normal(-1.5, 1);
}
generated quantities {}

Here is how I start sampling

fit = model.sampling(
    data={'N': counts_obs.shape[1],  
          'D': counts_obs.shape[0],
          't': metadata['t'].values.astype(float),
          'Psi': psi,
          'counts': counts_obs.T.copy()},
    warmup=500, iter=1000, chains=1, n_jobs=1
)

And here is what I get

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-127-9b02f9c10968> in <module>
      5           'Psi': psi,
      6           'counts': counts_obs.T.copy()},
----> 7     warmup=500, iter=1000, chains=1, n_jobs=1
      8 )

~/.conda/envs/pcr/lib/python3.6/site-packages/pystan/model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
    776         call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
    777         call_sampler_star = self.module._call_sampler_star
--> 778         ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
    779         samples = [smpl for _, smpl in ret_and_samples]
    780 

~/.conda/envs/pcr/lib/python3.6/site-packages/pystan/model.py in _map_parallel(function, args, n_jobs)
     88             pool.join()
     89     else:
---> 90         map_result = list(map(function, args))
     91     return map_result
     92 

stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890.pyx in stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890._call_sampler_star()

stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890.pyx in stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890._call_sampler()

RuntimeError: Initialization failed.

Having read several similar posts, I have already made sure all data are passed as NumPy arrays and no parameters can become 0. I have also tried to sample sigma from a half-normal, but that doesn’t seem to matter. Here is some mathematical context.

  1. Assume we have a population with D distinct groups. We know that each group grows exponentially as ({\lambda}_{i} +1)^{t} \cdot {c}_{i}^{\prime}, where 0 \lt {\lambda}_{i} \le 1 is that group’s growth rate, t is time and {c}_{i}^{\prime} is initial population size.
  2. We have a series of multinomial samples from the population at different time points, where counts correspond to the number of times a certain group is observed.
  3. We model the population in ILR (isometric log-ratio) coordinates: \boldsymbol{p}(t) = \text{ilr}^{-1} ( \boldsymbol{y} ), where \boldsymbol{y}(t) \sim \mathcal{N}(\text{ilr} (\boldsymbol{\lambda} + 1) \cdot t + \boldsymbol{\alpha}, \boldsymbol{\sigma}), \boldsymbol{\alpha} is just an intercept in ILR coordinates and \sigma is a vector of standard deviations in ILR coordinates. We sample \lambda's from a Beta prior. \boldsymbol{y} = \text{ilr}(\boldsymbol{x}) = \Psi \times \log(\boldsymbol{x}) and \text{ilr}^{-1}(\boldsymbol{y}) = \mathcal{C}(\exp(\Psi' \times \boldsymbol{y})), where \boldsymbol{x} is a D-part vector, \Psi is a (D-1) \times D orthonormal basis in a D-part simplex (a so-called contrast matrix), and \mathcal{C} denotes closure to a unit-simplex.
  4. Finally, observed counts are modelled as \boldsymbol{c}(t) \sim \text{Multinomial}(\boldsymbol{p}(t))

Update

I’ve found a post that offers a general approach to debugging initialisation issues. It also made helped me understand that this error message is the same as “Bad initial energy” message in PyMC3. So, I’ve been NaN-hunting. I’ve updated the model to fix some of the errors, but there is one thing a don’t understand:

data {
    int N;  // N samples
    int D;  // D components
    vector[N] t;  // time
    matrix[D-1, D] Psi;  // ILR contrast matrix (basis)
    int counts[N, D];  // Observed counts
}
parameters {
    vector<lower=0,upper=1>[D] lambda;  // exponential growth rates
    vector[D-1] alpha;  // intercept in ILR-coordinates 
    vector<lower=0>[D-1] sigma;  // StDev in ILR-coordinates
}
transformed parameters {}
model {
    vector[D-1] mu;  // expectation for ILR coordinates
    vector[D-1] coord;  // ILR coordinates
    vector[D] prob;  // a vector of probabilities for the multinomial 
    lambda ~ beta(4, 1);
    alpha ~ normal(0, 1.5);
    sigma ~ lognormal(-1.5, 1);
    for (n in 1:N) {
        // print(log(lambda + 1))
        mu = Psi * log(lambda+1) * t[n] + alpha;
        print(mu);
        print(sigma)
        coord ~ normal(mu, sigma);
        // print(coord)
        prob = exp(Psi' * coord) / sum(exp(Psi' * coord));
        counts[n,] ~ multinomial(prob);
    }
}
generated quantities {}

Stan logs contain the following message:

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: normal_lpdf: Random variable[1] is nan, but must not be nan! (in ‘unknown file name’ at line 27)

In other words, something is wrong with coord ~ normal(mu, sigma) inside the loop. I’ve already checked that both mu and sigma do not contain nan values:

[3.61959,3.61074,1.84193,-0.831585,3.068,-3.16802,-0.128876,-5.06131,-2.39376,6.96217,-1.54757,3.91368,5.46742,5.82423,-4.76566,-4.94248,0.30671,-0.578386,0.997372,-3.16698,1.55534,0.665509,5.20154,0.139708,-3.05867,-0.933114,-4.31377,-5.09469,1.78863,-1.26601,3.63067,0.551005,2.11949,4.73583,0.516342,-1.32069,4.65772,4.04653,0.795305,-5.09259,-5.43404,-0.500822,-2.07493,1.70666,-5.08649,-2.94182,-7.13371,-5.23849,6.68188,2.41275,5.32882,0.322703,-1.14324,7.16832,0.108541,-5.22924,3.01364,1.52697,-4.86676,1.81436,-1.09035,2.17651,2.37663,-2.41685,3.40156,-0.830815,3.41166,-1.18934,-1.87888,-2.35921,3.44936,-2.6891,-6.26322,3.59626,-1.50939,-4.74756,-3.18277,2.78778,3.15079,-5.45712,0.666367,-1.62401,4.99962,-1.63854,-3.91517,0.00922929,-3.01135,3.69132,-3.61608,5.17417,-0.145762,3.11831,5.42573,3.6789,-2.27073,1.98691,6.91936,-3.33031,1.57474,2.69374,-4.65052,-4.80807,-5.25164,-0.955291,-4.35992,-2.82116,0.693328,2.59116,1.30453,-4.31765,-0.793713,5.00205,3.2183,-2.95865,-6.26013,-3.38309,-5.24944,-3.03317,3.51115,-4.0517,4.66862,-4.89055,1.7523,3.13357,3.02388,-3.98441,2.46426,-0.670926,-2.39303,-4.27351,-1.24561,-5.30311,5.99563,-5.30392,2.85846,-4.66579,5.14095,-4.48287,1.1187]
[6.13952,0.572074,1.25277,1.90913,0.154876,0.788362,0.221999,6.5473,5.3303,2.5294,0.813722,2.62259,0.835902,4.34981,0.899515,0.233454,0.571515,3.03487,6.22654,0.939887,1.89274,3.07222,0.233838,0.299529,1.70895,0.683702,0.186334,1.94294,2.42151,3.75399,1.01165,1.53294,2.98085,4.60479,6.62264,0.145904,0.936486,4.07773,4.51134,1.9435,1.47366,0.15164,0.485037,1.1415,2.67297,3.37383,0.248162,0.370185,0.566116,0.970535,0.298791,1.0555,0.921357,0.681499,2.46573,0.460603,3.50968,0.210168,1.04486,4.85477,3.18573,6.12902,0.138466,2.36131,0.427282,0.367093,5.6573,5.36062,1.05131,4.40953,4.35017,0.602396,3.29693,0.541336,0.858526,0.276627,0.158699,0.241416,0.252346,0.136683,0.149635,6.64248,3.94269,0.242178,0.797728,1.65639,0.56543,1.18289,2.43234,0.942519,4.03283,6.9848,0.859371,0.308654,1.55666,2.48972,0.177679,4.33695,0.56958,6.36548,6.23759,0.215209,1.21682,0.292656,0.235332,1.22186,0.323701,0.138006,0.675583,0.563452,0.264373,5.87673,0.787552,1.22776,4.88825,0.532846,0.225707,0.353967,3.40864,0.889644,0.152828,2.71946,0.457686,0.771462,0.149052,1.50146,0.321766,1.65154,0.151494,0.734017,1.86677,1.00609,0.818023,0.158336,0.510543,2.08098,0.372893,1.2276,0.649099]

Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: normal_lpdf: Random variable[1] is nan, but must not be nan!  (in 'unknown file name' at line 27)

You want the sampler to draw random values for coord so coord must be declared as a parameter. Something like

parameters {
   ...
   vector[N, D-1] coord;
}
model {
 ...
  for (n in 1:N) {
    coord[n] ~ normal(mu, sigma);
    ...
  }
}

Simply putting it on the left-hand side of a ~ is not enough.

Also, lambda needs to be declared with constraints that match the beta prior.

vector<lower=0,upper=1>[D] lambda;

By the way

prob = exp(Psi' * coord) / sum(exp(Psi' * coord));

is equivalent to

prob = softmax(Psi' * coord);

which a bit more accurate and efficient.

1 Like

Thank you very much for your response. I’ve already figured out that all variables with constrained priors must be declared with appropriate constraints. This is a bit counterintuitive (at least for someone coming from PyMC3). The coord's being a parameter made the difference. The model runs now, though it appears to be only marginally faster than the PyMC3 counterpart (perhaps, this has something to do with PyMC’s having parallelised matrix and vector operations).
I’d like to make a couple of comments, though. I don’t think you can declare vector[N, D-1]: you need an array or a matrix type. Assuming, coord is a matrix, softmax(Psi' * coord[n]) should be softmax(to_vector(coord[n] * Psi)).

Ahh, yes, I was thinking of an array of vectors, vector[D-1] coord[N];.