Incorrect number of categories with categorical_lpmf for a mixture model

psyguy · May 10, 2024, 3:39pm

Hi there,

I am trying to specify a mixture time series model that involves a mixing discrete variables. I get the following error for all chains while fitting the model, which I cannot figure out how to deal with:

SAMPLING FOR MODEL 'NDARMA(1,1)' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: categorical_lpmf: Number of categories is -2147483648, but must be in the interval [1, 7] (in 'string', line 16, column 2 to column 31)

So, this is what I’m doing:

Model (maybe skip)

I have a sequence of observed integers X_t between 1 and k, following a categorical distribution \Pi with category probabilities \lambda, and its values at each time is probabilistically determined by its lag-1 value, and a latent ‘innovation’ process U_t with the same marginal distribution (i.e., U_t \sim \Pi) as

X_t = \begin{cases} X_{t-1} &\quad \text{with probability } \alpha_1 \\ U_{t} &\quad \text{with probability } \beta_0 \\ U_{t-1} &\quad \text{with probability } \beta_1 \end{cases} \quad,

in which \alpha_1, \beta_0, \beta_1 form (let’s represent it with \psi) a unit 3-simplex (i.e., \alpha_1, \beta_0, \beta_1 >0 and \alpha_1 + \beta_0 + \beta_1 = 1).

What I want to estimate are the probabilities in \lambda and \psi, and not U_t.

Derivation

The Log-likelihood of this process (if I’m not mistaken) may be written as

\begin{align} \log\mathcal{L}(X_t = x) &= \log \Pr(X_t = x | \lambda, \psi; X_{t-1}, X_{t-2}, \dots; U_{t}, U_{t-1}, U_{t-2}, \dots) \\ &= \log \Pr(X_t = x | \lambda, \psi; X_{t-1}, U_{t}, U_{t-1}) \\ &= \sum_{t = 1}^N \log \Big[ \alpha_1 \Pr(X_{t-1} = x | \lambda) + \beta_0 \Pr(U_{t} = x | \lambda) + \beta_1 \Pr(U_{t-1} = x | \lambda) \Big] \\ &= \sum_{t = 1}^N log\_sum\_exp \Big[\log (\alpha_1) + \log \Pr(X_{t-1} = x | \lambda), \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \log (\beta_0) + \log \Pr(U_{t} = x | \lambda), \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \log(\beta_1) + \log\Pr(U_{t-1} = x | \lambda) \Big] \end{align}

Implementation

I have tried implementing it in Stan using the following code, which leads to the said error:

data {
  int<lower=1> N; // Number of observations
  int<lower=1> k; // Number of categories in X_t (and U_t)
  int<lower=1, upper=k> X_t[N]; // Observed data X_t
}

parameters {
  simplex[k] lambda; // Parameters of the multinomial (marginal) distribution of U_t
  simplex[3] psi; // Probabilities for selection process
}

model {
  // Priors (uninformative)
  lambda ~ dirichlet(rep_vector(2.0, k));
  psi ~ dirichlet(rep_vector(2.0, 3));

  // Latent innovations
  int U_t[N];
  U_t[1] ~ categorical(lambda); // innovation at t=1

  // likelihood
  for (t in 2:N) {
    U_t[t] ~ categorical(lambda);
    vector[N] contributions = rep_vector(0, 3);

    // calculating contribution of each term
    contributions[1] = log(psi[1]) +
                       categorical_lpmf(X_t[t-1] | lambda);
    contributions[2] = log(psi[2]) +
                       categorical_lpmf(U_t[t] | lambda);
    contributions[3] = log(psi[3]) +
                       categorical_lpmf(U_t[t-1] | lambda);

    // incrememting loglikelihood
    target += log_sum_exp(contributions);
  }
}

I compiled it with rstan::stan_model() and got the error message while sampling using rstan::sampling(), and get the following output at the end:

here are whatever error messages were returned
[[1]]
Stan model 'NDARMA(1,1)' does not contain samples.

[[2]]
Stan model 'NDARMA(1,1)' does not contain samples.

[[3]]
Stan model 'NDARMA(1,1)' does not contain samples.

[[4]]
Stan model 'NDARMA(1,1)' does not contain samples.

Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
4: In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug

I’m not a Bayes- or Stan-savvy and playing around with the code (and going through the manual, function reference, and user’s guide) hasn’t helped. All I could find is that and overflow and some initial parameters should be changed but I cannot put my finger on it.

I appreciate your time!

Kees_Mulder · September 15, 2024, 9:33pm

I think this has to do with the automatic selection of initial values. Could you try giving fixed initial values to the sampling to init in sampling?

TeemuSailynoja · September 18, 2024, 10:17am

You say that X_t follows a categorical distribution \Pi with category probabilities \lambda, but then later state

X_t = \begin{cases} X_{t-1} &\quad \text{with probability } \alpha_1 \\ U_{t} &\quad \text{with probability } \beta_0 \\ U_{t-1} &\quad \text{with probability } \beta_1 \end{cases}

To me this sounds like two contradicting statements on the distribution of X_t, or perhaps I misunderstand something.

Also, how is X_0 distributed? Is there a U_{-1}?

The error seems to raise from you creation of U_t in the model block. The values initialised are not constrained between 1 and k, causing an error when evaluating
U_t[1] ~ categorical(lambda);. Can you express the likelihood without U_t? I.e.

P(X_t) = \begin{cases}\alpha_1 \mathbb{I}(X_{t-1} = X_{t})\\\beta_0 \Pi(X_t|\lambda)\\ \beta_1(\text{some recursive expression for $U_{t-1}$ depending on $\lambda$ and $X_{t-1}$})\end{cases}

psyguy · September 23, 2024, 1:27pm

Hi there, thanks you both! Sorry for my delay, I wanted to try rewriting the likelihood before getting back to you.

@Kees_Mulder: I tried fixing the initial values using init (by passing it a list and a function making the list) but it didn’t solve the issue.

@TeemuSailynoja: I should have mentioned that the marginal distribution of the observed X_t's and the unobserved U_t's are the same; X_{1, 2, \dots,t} is a mixture of U_{0,1, 2, \dots, t}'s, and assuming that U_{t} is iid stationary (and follows \Pi) entails that the marginal distribution of X_t would also follow \Pi. Writing X_t recursively also leads there

I am skeptical whether I can really write it recursively. However, I found closed-form expression for the joint probabilities from Weiss (2009):

\begin{aligned} &P(X_t = i_0, X_{t-1} = i_1, U_t = j_0, U_{t-1} = j_1) = \\ &p_{j_0}p_{j_1} \cdot (\beta_0 \delta_{i_0 j_0} + \beta_1 \delta_{i_0 j_1} + \alpha_1 \delta_{i_0 i_1})((1 - \beta_0) p_{i_1} + \beta_0 \delta_{i_1 j_1}) \end{aligned} \quad,

in which \delta is the Kronecker delta and p_{m_n}'s are the probabilities of X_t/U_t having the category m_n.

And its (GPT-assisted) implementation in Stan runs (although doesn’t estimate \beta_0 and \beta_1 correctly):

data {
  int<lower=1> N;           // Number of observations in the sequence
  int<lower=1> k;           // Number of categories
  int<lower=1, upper=k> X[N];  // Observed data sequence
}

parameters {
  simplex[3] psi;           // psi[1] is alpha1, psi[2] is beta0, psi[3] is beta1
  simplex[k] p;             // Marginal probabilities of categories as a parameter
}

model {  
  // Likelihood
  for (t in 2:N) {
    vector[k] probs = rep_vector(0, k);  // Initialize category probabilities
    for (j0 in 1:k) {  // Corrected loop declaration
      for (j1 in 1:k) {  // Corrected loop declaration
        real common_term = psi[2] * (j0 == X[t]) + psi[3] * (j1 == X[t]) + psi[1] * (X[t-1] == X[t]);
        common_term *= ((1 - psi[2]) * p[X[t-1]] + psi[2] * (X[t-1] == j1));
        probs[X[t]] += p[j0] * p[j1] * common_term;
      }
    }
    target += log(probs[X[t]]);  // Add log probability of observing X[t]
  }
}
``

I think I would/should/might be able to figure out the problem with $\beta_0$ and $\beta_1$ estimation; it could be due to using the incorrect expression for the joint probability (I derived it from a more general expression).

---

**P.S.** Regardless of the correct likelihood function (which would result in wrong estimates) for my own understanding, I'm a still curious to know how could I have avoided the Stan error.

Topic		Replies	Views
Error with categorical distribution RStan	1	592	January 5, 2021
Error "categorical_lpmf: Probabilities parameter is not a valid simplex. " Modeling rstan	5	1370	August 13, 2020
Function for categorical response with no exponentiation stan Modeling rstan	12	741	June 18, 2020
Bayesian Estimation Kirmann Model Modeling	13	1171	April 12, 2021
Error when do categorical General rstan	1	282	June 8, 2021

Incorrect number of categories with categorical_lpmf for a mixture model

Model (maybe skip)

Derivation

Implementation

Related topics