Greetings, everybody. Trying to model a mixture of exponential growth processes using a logistic-normal-multinomial model. I have a working implementation in PyMC3, but the sampling is painfully slow, which is why I’m looking forward to reimplementing it in PyStan.
data {
int N; // N samples
int D; // D components
vector[N] t; // time
matrix[D-1, D] Psi; // ILR contrast matrix (basis)
int counts[N, D]; // Observed counts
}
parameters {
vector[D] lambda; // exponential growth rates
vector[D-1] alpha; // intercept in ILR-coordinates
vector[D-1] log_sigma; // log of StDev in ILR-coordinates
}
transformed parameters {
vector[D-1] sigma = exp(log_sigma);
}
model {
vector[D-1] mu; // expectation for ILR coordinates
vector[D-1] coord; // ILR coordinates
vector[D] prob; // a vector of probabilities for the multinomial
for (n in 1:N){
mu = Psi * log(lambda+1) * t[n] + alpha;
coord ~ normal(mu, sigma);
prob = exp(Psi' * coord) / sum(exp(Psi' * coord));
counts[n,] ~ multinomial(prob);
}
lambda ~ beta(4, 1);
alpha ~ normal(0, 1.5);
log_sigma ~ normal(-1.5, 1);
}
generated quantities {}
Here is how I start sampling
fit = model.sampling(
data={'N': counts_obs.shape[1],
'D': counts_obs.shape[0],
't': metadata['t'].values.astype(float),
'Psi': psi,
'counts': counts_obs.T.copy()},
warmup=500, iter=1000, chains=1, n_jobs=1
)
And here is what I get
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-127-9b02f9c10968> in <module>
5 'Psi': psi,
6 'counts': counts_obs.T.copy()},
----> 7 warmup=500, iter=1000, chains=1, n_jobs=1
8 )
~/.conda/envs/pcr/lib/python3.6/site-packages/pystan/model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
776 call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
777 call_sampler_star = self.module._call_sampler_star
--> 778 ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
779 samples = [smpl for _, smpl in ret_and_samples]
780
~/.conda/envs/pcr/lib/python3.6/site-packages/pystan/model.py in _map_parallel(function, args, n_jobs)
88 pool.join()
89 else:
---> 90 map_result = list(map(function, args))
91 return map_result
92
stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890.pyx in stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890._call_sampler_star()
stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890.pyx in stanfit4growth_dynamics_2e53a7591b97d6bac430fe57d9635a04_3078155904287467890._call_sampler()
RuntimeError: Initialization failed.
Having read several similar posts, I have already made sure all data are passed as NumPy arrays and no parameters can become 0. I have also tried to sample sigma
from a half-normal, but that doesn’t seem to matter. Here is some mathematical context.
- Assume we have a population with D distinct groups. We know that each group grows exponentially as ({\lambda}_{i} +1)^{t} \cdot {c}_{i}^{\prime}, where 0 \lt {\lambda}_{i} \le 1 is that group’s growth rate, t is time and {c}_{i}^{\prime} is initial population size.
- We have a series of multinomial samples from the population at different time points, where counts correspond to the number of times a certain group is observed.
- We model the population in ILR (isometric log-ratio) coordinates: \boldsymbol{p}(t) = \text{ilr}^{-1} ( \boldsymbol{y} ), where \boldsymbol{y}(t) \sim \mathcal{N}(\text{ilr} (\boldsymbol{\lambda} + 1) \cdot t + \boldsymbol{\alpha}, \boldsymbol{\sigma}), \boldsymbol{\alpha} is just an intercept in ILR coordinates and \sigma is a vector of standard deviations in ILR coordinates. We sample \lambda's from a Beta prior. \boldsymbol{y} = \text{ilr}(\boldsymbol{x}) = \Psi \times \log(\boldsymbol{x}) and \text{ilr}^{-1}(\boldsymbol{y}) = \mathcal{C}(\exp(\Psi' \times \boldsymbol{y})), where \boldsymbol{x} is a D-part vector, \Psi is a (D-1) \times D orthonormal basis in a D-part simplex (a so-called contrast matrix), and \mathcal{C} denotes closure to a unit-simplex.
- Finally, observed counts are modelled as \boldsymbol{c}(t) \sim \text{Multinomial}(\boldsymbol{p}(t))
Update
I’ve found a post that offers a general approach to debugging initialisation issues. It also made helped me understand that this error message is the same as “Bad initial energy” message in PyMC3. So, I’ve been NaN-hunting. I’ve updated the model to fix some of the errors, but there is one thing a don’t understand:
data {
int N; // N samples
int D; // D components
vector[N] t; // time
matrix[D-1, D] Psi; // ILR contrast matrix (basis)
int counts[N, D]; // Observed counts
}
parameters {
vector<lower=0,upper=1>[D] lambda; // exponential growth rates
vector[D-1] alpha; // intercept in ILR-coordinates
vector<lower=0>[D-1] sigma; // StDev in ILR-coordinates
}
transformed parameters {}
model {
vector[D-1] mu; // expectation for ILR coordinates
vector[D-1] coord; // ILR coordinates
vector[D] prob; // a vector of probabilities for the multinomial
lambda ~ beta(4, 1);
alpha ~ normal(0, 1.5);
sigma ~ lognormal(-1.5, 1);
for (n in 1:N) {
// print(log(lambda + 1))
mu = Psi * log(lambda+1) * t[n] + alpha;
print(mu);
print(sigma)
coord ~ normal(mu, sigma);
// print(coord)
prob = exp(Psi' * coord) / sum(exp(Psi' * coord));
counts[n,] ~ multinomial(prob);
}
}
generated quantities {}
Stan logs contain the following message:
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: normal_lpdf: Random variable[1] is nan, but must not be nan! (in ‘unknown file name’ at line 27)
In other words, something is wrong with coord ~ normal(mu, sigma)
inside the loop. I’ve already checked that both mu
and sigma
do not contain nan
values:
[3.61959,3.61074,1.84193,-0.831585,3.068,-3.16802,-0.128876,-5.06131,-2.39376,6.96217,-1.54757,3.91368,5.46742,5.82423,-4.76566,-4.94248,0.30671,-0.578386,0.997372,-3.16698,1.55534,0.665509,5.20154,0.139708,-3.05867,-0.933114,-4.31377,-5.09469,1.78863,-1.26601,3.63067,0.551005,2.11949,4.73583,0.516342,-1.32069,4.65772,4.04653,0.795305,-5.09259,-5.43404,-0.500822,-2.07493,1.70666,-5.08649,-2.94182,-7.13371,-5.23849,6.68188,2.41275,5.32882,0.322703,-1.14324,7.16832,0.108541,-5.22924,3.01364,1.52697,-4.86676,1.81436,-1.09035,2.17651,2.37663,-2.41685,3.40156,-0.830815,3.41166,-1.18934,-1.87888,-2.35921,3.44936,-2.6891,-6.26322,3.59626,-1.50939,-4.74756,-3.18277,2.78778,3.15079,-5.45712,0.666367,-1.62401,4.99962,-1.63854,-3.91517,0.00922929,-3.01135,3.69132,-3.61608,5.17417,-0.145762,3.11831,5.42573,3.6789,-2.27073,1.98691,6.91936,-3.33031,1.57474,2.69374,-4.65052,-4.80807,-5.25164,-0.955291,-4.35992,-2.82116,0.693328,2.59116,1.30453,-4.31765,-0.793713,5.00205,3.2183,-2.95865,-6.26013,-3.38309,-5.24944,-3.03317,3.51115,-4.0517,4.66862,-4.89055,1.7523,3.13357,3.02388,-3.98441,2.46426,-0.670926,-2.39303,-4.27351,-1.24561,-5.30311,5.99563,-5.30392,2.85846,-4.66579,5.14095,-4.48287,1.1187]
[6.13952,0.572074,1.25277,1.90913,0.154876,0.788362,0.221999,6.5473,5.3303,2.5294,0.813722,2.62259,0.835902,4.34981,0.899515,0.233454,0.571515,3.03487,6.22654,0.939887,1.89274,3.07222,0.233838,0.299529,1.70895,0.683702,0.186334,1.94294,2.42151,3.75399,1.01165,1.53294,2.98085,4.60479,6.62264,0.145904,0.936486,4.07773,4.51134,1.9435,1.47366,0.15164,0.485037,1.1415,2.67297,3.37383,0.248162,0.370185,0.566116,0.970535,0.298791,1.0555,0.921357,0.681499,2.46573,0.460603,3.50968,0.210168,1.04486,4.85477,3.18573,6.12902,0.138466,2.36131,0.427282,0.367093,5.6573,5.36062,1.05131,4.40953,4.35017,0.602396,3.29693,0.541336,0.858526,0.276627,0.158699,0.241416,0.252346,0.136683,0.149635,6.64248,3.94269,0.242178,0.797728,1.65639,0.56543,1.18289,2.43234,0.942519,4.03283,6.9848,0.859371,0.308654,1.55666,2.48972,0.177679,4.33695,0.56958,6.36548,6.23759,0.215209,1.21682,0.292656,0.235332,1.22186,0.323701,0.138006,0.675583,0.563452,0.264373,5.87673,0.787552,1.22776,4.88825,0.532846,0.225707,0.353967,3.40864,0.889644,0.152828,2.71946,0.457686,0.771462,0.149052,1.50146,0.321766,1.65154,0.151494,0.734017,1.86677,1.00609,0.818023,0.158336,0.510543,2.08098,0.372893,1.2276,0.649099]
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: normal_lpdf: Random variable[1] is nan, but must not be nan! (in 'unknown file name' at line 27)