I am trying to specify a mixture time series model that involves a mixing discrete variables. I get the following error for all chains while fitting the model, which I cannot figure out how to deal with:
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: categorical_lpmf: Number of categories is -2147483648, but must be in the interval [1, 7] (in 'string', line 16, column 2 to column 31)
So, this is what I’m doing:
Model (maybe skip)
I have a sequence of observed integers X_t between 1 and k, following a categorical distribution \Pi with category probabilities \lambda, and its values at each time is probabilistically determined by its lag-1 value, and a latent ‘innovation’ process U_t with the same marginal distribution (i.e., U_t \sim \Pi) as
in which \alpha_1, \beta_0, \beta_1 form (let’s represent it with \psi) a unit 3-simplex (i.e., \alpha_1, \beta_0, \beta_1 >0 and \alpha_1 + \beta_0 + \beta_1 = 1).
What I want to estimate are the probabilities in \lambda and \psi, and not U_t.
The Log-likelihood of this process (if I’m not mistaken) may be written as
I have tried implementing it in Stan using the following code, which leads to the said error:
data {
int<lower=1> N; // Number of observations
int<lower=1> k; // Number of categories in X_t (and U_t)
int<lower=1, upper=k> X_t[N]; // Observed data X_t
parameters {
simplex[k] lambda; // Parameters of the multinomial (marginal) distribution of U_t
simplex[3] psi; // Probabilities for selection process
model {
// Priors (uninformative)
lambda ~ dirichlet(rep_vector(2.0, k));
psi ~ dirichlet(rep_vector(2.0, 3));
// Latent innovations
int U_t[N];
U_t[1] ~ categorical(lambda); // innovation at t=1
// likelihood
for (t in 2:N) {
U_t[t] ~ categorical(lambda);
vector[N] contributions = rep_vector(0, 3);
// calculating contribution of each term
contributions[1] = log(psi[1]) +
categorical_lpmf(X_t[t-1] | lambda);
contributions[2] = log(psi[2]) +
categorical_lpmf(U_t[t] | lambda);
contributions[3] = log(psi[3]) +
categorical_lpmf(U_t[t-1] | lambda);
// incrememting loglikelihood
target += log_sum_exp(contributions);
I compiled it with rstan::stan_model()
and got the error message while sampling using rstan::sampling()
, and get the following output at the end:
here are whatever error messages were returned
Stan model 'NDARMA(1,1)' does not contain samples.
Stan model 'NDARMA(1,1)' does not contain samples.
Stan model 'NDARMA(1,1)' does not contain samples.
Stan model 'NDARMA(1,1)' does not contain samples.
Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
4: In .local(object, ...) :
some chains had errors; consider specifying chains = 1 to debug
I’m not a Bayes- or Stan-savvy and playing around with the code (and going through the manual, function reference, and user’s guide) hasn’t helped. All I could find is that and overflow and some initial parameters should be changed but I cannot put my finger on it.
I appreciate your time!