Incorrect number of categories with categorical_lpmf for a mixture model

Hi there,

I am trying to specify a mixture time series model that involves a mixing discrete variables. I get the following error for all chains while fitting the model, which I cannot figure out how to deal with:

SAMPLING FOR MODEL 'NDARMA(1,1)' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: categorical_lpmf: Number of categories is -2147483648, but must be in the interval [1, 7] (in 'string', line 16, column 2 to column 31)

So, this is what I’m doing:

Model (maybe skip)

I have a sequence of observed integers X_t between 1 and k, following a categorical distribution \Pi with category probabilities \lambda, and its values at each time is probabilistically determined by its lag-1 value, and a latent ‘innovation’ process U_t with the same marginal distribution (i.e., U_t \sim \Pi) as

X_t = \begin{cases} X_{t-1} &\quad \text{with probability } \alpha_1 \\ U_{t} &\quad \text{with probability } \beta_0 \\ U_{t-1} &\quad \text{with probability } \beta_1 \end{cases} \quad,

in which \alpha_1, \beta_0, \beta_1 form (let’s represent it with \psi) a unit 3-simplex (i.e., \alpha_1, \beta_0, \beta_1 >0 and \alpha_1 + \beta_0 + \beta_1 = 1).

What I want to estimate are the probabilities in \lambda and \psi, and not U_t.

Derivation

The Log-likelihood of this process (if I’m not mistaken) may be written as

\begin{align} \log\mathcal{L}(X_t = x) &= \log \Pr(X_t = x | \lambda, \psi; X_{t-1}, X_{t-2}, \dots; U_{t}, U_{t-1}, U_{t-2}, \dots) \\ &= \log \Pr(X_t = x | \lambda, \psi; X_{t-1}, U_{t}, U_{t-1}) \\ &= \sum_{t = 1}^N \log \Big[ \alpha_1 \Pr(X_{t-1} = x | \lambda) + \beta_0 \Pr(U_{t} = x | \lambda) + \beta_1 \Pr(U_{t-1} = x | \lambda) \Big] \\ &= \sum_{t = 1}^N log\_sum\_exp \Big[\log (\alpha_1) + \log \Pr(X_{t-1} = x | \lambda), \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \log (\beta_0) + \log \Pr(U_{t} = x | \lambda), \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \log(\beta_1) + \log\Pr(U_{t-1} = x | \lambda) \Big] \end{align}

Implementation

I have tried implementing it in Stan using the following code, which leads to the said error:

data {
  int<lower=1> N; // Number of observations
  int<lower=1> k; // Number of categories in X_t (and U_t)
  int<lower=1, upper=k> X_t[N]; // Observed data X_t
}

parameters {
  simplex[k] lambda; // Parameters of the multinomial (marginal) distribution of U_t
  simplex[3] psi; // Probabilities for selection process
}

model {
  // Priors (uninformative)
  lambda ~ dirichlet(rep_vector(2.0, k));
  psi ~ dirichlet(rep_vector(2.0, 3));

  // Latent innovations
  int U_t[N];
  U_t[1] ~ categorical(lambda); // innovation at t=1

  // likelihood
  for (t in 2:N) {
    U_t[t] ~ categorical(lambda);
    vector[N] contributions = rep_vector(0, 3);

    // calculating contribution of each term
    contributions[1] = log(psi[1]) +
                       categorical_lpmf(X_t[t-1] | lambda);
    contributions[2] = log(psi[2]) +
                       categorical_lpmf(U_t[t] | lambda);
    contributions[3] = log(psi[3]) +
                       categorical_lpmf(U_t[t-1] | lambda);

    // incrememting loglikelihood
    target += log_sum_exp(contributions);
  }
}

I compiled it with rstan::stan_model() and got the error message while sampling using rstan::sampling(), and get the following output at the end:

here are whatever error messages were returned
[[1]]
Stan model 'NDARMA(1,1)' does not contain samples.

[[2]]
Stan model 'NDARMA(1,1)' does not contain samples.

[[3]]
Stan model 'NDARMA(1,1)' does not contain samples.

[[4]]
Stan model 'NDARMA(1,1)' does not contain samples.

Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
4: In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug

I’m not a Bayes- or Stan-savvy and playing around with the code (and going through the manual, function reference, and user’s guide) hasn’t helped. All I could find is that and overflow and some initial parameters should be changed but I cannot put my finger on it.

I appreciate your time!

I think this has to do with the automatic selection of initial values. Could you try giving fixed initial values to the sampling to init in sampling?

1 Like

You say that X_t follows a categorical distribution \Pi with category probabilities \lambda, but then later state

X_t = \begin{cases} X_{t-1} &\quad \text{with probability } \alpha_1 \\ U_{t} &\quad \text{with probability } \beta_0 \\ U_{t-1} &\quad \text{with probability } \beta_1 \end{cases}

To me this sounds like two contradicting statements on the distribution of X_t, or perhaps I misunderstand something.

Also, how is X_0 distributed? Is there a U_{-1}?

The error seems to raise from you creation of U_t in the model block. The values initialised are not constrained between 1 and k, causing an error when evaluating
U_t[1] ~ categorical(lambda);. Can you express the likelihood without U_t? I.e.

P(X_t) = \begin{cases}\alpha_1 \mathbb{I}(X_{t-1} = X_{t})\\\beta_0 \Pi(X_t|\lambda)\\ \beta_1(\text{some recursive expression for $U_{t-1}$ depending on $\lambda$ and $X_{t-1}$})\end{cases}
1 Like

Hi there, thanks you both! Sorry for my delay, I wanted to try rewriting the likelihood before getting back to you.

@Kees_Mulder: I tried fixing the initial values using init (by passing it a list and a function making the list) but it didn’t solve the issue.

@TeemuSailynoja: I should have mentioned that the marginal distribution of the observed X_t's and the unobserved U_t's are the same; X_{1, 2, \dots,t} is a mixture of U_{0,1, 2, \dots, t}'s, and assuming that U_{t} is iid stationary (and follows \Pi) entails that the marginal distribution of X_t would also follow \Pi. Writing X_t recursively also leads there

I am skeptical whether I can really write it recursively. However, I found closed-form expression for the joint probabilities from Weiss (2009):

\begin{aligned} &P(X_t = i_0, X_{t-1} = i_1, U_t = j_0, U_{t-1} = j_1) = \\ &p_{j_0}p_{j_1} \cdot (\beta_0 \delta_{i_0 j_0} + \beta_1 \delta_{i_0 j_1} + \alpha_1 \delta_{i_0 i_1})((1 - \beta_0) p_{i_1} + \beta_0 \delta_{i_1 j_1}) \end{aligned} \quad,

in which \delta is the Kronecker delta and p_{m_n}'s are the probabilities of X_t/U_t having the category m_n.

And its (GPT-assisted) implementation in Stan runs (although doesn’t estimate \beta_0 and \beta_1 correctly):

data {
  int<lower=1> N;           // Number of observations in the sequence
  int<lower=1> k;           // Number of categories
  int<lower=1, upper=k> X[N];  // Observed data sequence
}

parameters {
  simplex[3] psi;           // psi[1] is alpha1, psi[2] is beta0, psi[3] is beta1
  simplex[k] p;             // Marginal probabilities of categories as a parameter
}

model {  
  // Likelihood
  for (t in 2:N) {
    vector[k] probs = rep_vector(0, k);  // Initialize category probabilities
    for (j0 in 1:k) {  // Corrected loop declaration
      for (j1 in 1:k) {  // Corrected loop declaration
        real common_term = psi[2] * (j0 == X[t]) + psi[3] * (j1 == X[t]) + psi[1] * (X[t-1] == X[t]);
        common_term *= ((1 - psi[2]) * p[X[t-1]] + psi[2] * (X[t-1] == j1));
        probs[X[t]] += p[j0] * p[j1] * common_term;
      }
    }
    target += log(probs[X[t]]);  // Add log probability of observing X[t]
  }
}
``

I think I would/should/might be able to figure out the problem with $\beta_0$ and $\beta_1$ estimation; it could be due to using the incorrect expression for the joint probability (I derived it from a more general expression).

---

**P.S.** Regardless of the correct likelihood function (which would result in wrong estimates) for my own understanding, I'm a still curious to know how could I have avoided the Stan error.