I am working on a Hidden Markov Model, which will eventually be used for analysing species detections from camera trap data.

Currently I have a model which uses the new hmm_marginal function - the full code for the model is below. The model has two hidden states (1 = unoccupied, 2 = occupied) and a Poisson emission process representing the number of detections at each time step (e.g. the number of camera trap photos taken in each 30-minute block of time). The model takes data from multiple sites, each with a different number of time steps. The model appears to be working well, and successfully recovers the true parameter values when I give it simulated data.

I would like to extend the model to include one or more time-varying covariates for the state transition probabilities. However, my understanding is that doing so requires assuming that the transition probability matrix is also time-varying, and I am unsure whether hmm_marginal allows for this. My question is whether I can adapt my current model to include covariates, or whether I need to re-write it in a way that avoids using hmm_marginal.

Thanks!

My model code:

data{
int sites; // Number of sites, indexed s
array[sites] int t_steps; // Number of time steps, indexed t - differs per site
int t_max; // Number of time steps in site with the longest time series
array[sites, t_max] int y; // Observed data - NA values replaced with -9999
int<lower=0> K; // Number of hidden states (K = 2: 1 = unoccupied, 2 = occupied)
}
parameters{
// Rows of transition matrix
simplex[K] t1;
simplex[K] t2;
// Initial state probabilities
simplex[K] delta;
// Parameters of measurement (emission) model
positive_ordered[K] lambda;
}
transformed parameters{
matrix[2, 2] Theta = rep_matrix(0, 2, 2);
matrix[K, t_max] log_omega;
// Build the transition matrix
Theta[1, :] = t1';
Theta[2, :] = t2';
// Compute log likelihoods in each possible state
for(s in 1:sites){
for(t in 1:t_steps[s]){
log_omega[1,t] = poisson_lpmf(y[s,t] | lambda[1]);
log_omega[2,t] = poisson_lpmf(y[s,t] | lambda[2]);
}
}
}
model{
// Priors
lambda ~ exponential(1);
delta ~ dirichlet([9, 1]); // More likely to begin in state 1 (unoccupied)
t1 ~ dirichlet([1,1]);
t2 ~ dirichlet([1,1]);
// Add to target log density
target += hmm_marginal(log_omega, Theta, delta);
}

Do you need any resources/tips about implementing the forward algorithm in Stan to avoid hmm_marginal? Relevant documentation to get started is here 2.6 Hidden Markov models | Stan Userâ€™s Guide

Note that for problems like yours, you can exploit the fact that the true state is known whenever a timestep has a detection to reduce the number of terms that the forward algorithm needs to sum over. If a lot of timesteps have nonzero detection, then this can be a big efficiency gain (though I have no idea if it is likely to be competitive with whatever hmm_marginal is doing under-the-hood).

Tangentially related: note that in the model without time-varying transition probabilities, it might be reasonable for you to simplify your model by computing delta based on the equilibrium frequencies implied by the transition matrix, rather than estimating them separately. But once you no longer assume constant transition probabilities, this idea goes out the window.

Thanks everyone for your advice! Thatâ€™s unfortunate that the feature isnâ€™t in hmm_marginal yet, although I see from the replies to the link that @mbjoseph posted that itâ€™s coming in the next release.

Iâ€™ve had a look at implementing the forward algorithm by adapting the code on Luis Damianoâ€™s tutorial, starting with just one site (code below). It seems to work with my simulated data, so now Iâ€™ll look at extending to include multiple sites and then covariates.

@jsocolar for some species at least there could be quite a lot of timesteps with non-zero detection (although I guess this depends to an extent on how long a timestep is defined to be, which is something Iâ€™ll need to think about some more before analysing the real data). It could definitely be useful to have an increase in efficiency in these cases. How would I go about implementing this feature in the forward algorithm?

data{
int<lower=0> t_steps; // Number of timesteps, indexed t
int<lower=0> K; // Number of hidden states (here K= 2, 1 = unoccupied, 2 = occupied)
int<lower=0> y[t_steps]; // Observed data (counts of detections at each time t)
}
parameters{
// State model
simplex[K] delta; // Initial state probabilities
simplex[K] Theta[K]; // Transition probability matrix
// Observation (emission) model
positive_ordered[K] lambda; // Expected count of detections for each state k
}
transformed parameters{
vector[K] logalpha[t_steps];
{ // Forward algorithm log p(z_t = j | x_{1:t})
real accumulator[K];
logalpha[1] = log(delta) + poisson_lpmf(y[1] | lambda);
for (t in 2:t_steps) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state + transition prob + local evidence at t
accumulator[i] = logalpha[t-1, i] + log(Theta[i, j]) + poisson_lpmf(y[t] | lambda[j]);
}
logalpha[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
model {
// Priors
lambda ~ exponential(1);
delta ~ dirichlet([9, 1]); // More likely to begin in state 1 (unoccupied)
Theta[1,] ~ dirichlet([1,1]);
Theta[1,] ~ dirichlet([1,1]);
// Increment target
target += log_sum_exp(logalpha[t_steps]); // Note: update based only on last logalpha
}

Inside the loop over timesteps, where you loop over i and j, you are computing the likelihood increments associated with four belief states (0 â†’ 0, 0 â†’ 1, 1 â†’ 0, 1 â†’ 1). If you have nondetections in both periods, then all four belief states contribute to the likelihood. But if you have detections in both periods, then only the final state contributes to the likelihood. If you have a detection followed by a non-detection, then only states 3 and 4 matter. If you have a non-detection followed by a detection, then only states 2 and 4 matter. You can use if/else statements to break these cases apart and the avoid doing the computation associated with any of the disallowed hidden states.

In some particularly simple special cases it might be possible to encode time-varying state transition structure by augmenting the space of hidden states and adding some additional state to track some internal â€śclockâ€ť.

I doubt this would apply for your application, if there are time-varying covariates that influence transition probabilities (I am guessing this could be something like the observed temperature at each site at timestep t, or some measurement of the relative attractiveness of the site s at time t compared to neighbouring sites) it seems like it should be relatively straight forward to directly include those dependencies in the expression for the log-probability term in the forward algorithm.

If we didnâ€™t have time-varying covariates but instead had a particularly simple special case of time-varying state transition structure, it might be possible to encode the time variance by adding/augmenting artificial states into the state space and end up with a time independent model:

Suppose the time dependent state transition probabilities had some cyclical relationship in sync with the time step duration with period three, we could take the product of the original 2-element unoccupied occupied state space \{u, o\} with a 3-element phase state space \{0, 1, 2\} to give a resulting 6-element product state space \{(u, 0), (u, 1), \ldots, (o, 2)\}. Then we could define a single 6 \times 6 time-independent state transition matrix that encodes three sets of transition probabilities for u \rightarrow u, u \rightarrow o, o \rightarrow u and o \rightarrow o â€“ one set for each of the three â€śclockâ€ť phases {0, 1, 2} â€“ as well as encoding the rule forcing the clock phase component of the state space to advance with time, i.e. the probability of all transitions (x, \tau) \rightarrow (x^{\prime}, \tau^{\prime}) must be zero unless \tau^{\prime} = \tau + 1 \mod 3.

This technique to turn a time-dependent model into a time-independent one by embedding some kind of artificial â€śclockâ€ť in the state space probably isnâ€™t a great idea if it can be avoided, as the computational work required for HMM algorithms tends to scale like \mathcal{O}(S^2) or worse for S hidden states â€“ and if you need to infer model parameters that control the state transition or observation (emission) models then maybe inflating the state space also explodes the number of parameters that need to be estimated.

Hi all,
Itâ€™s cool to see people are using the hmm routines :)

Regarding adding a new feature, the first step is to create a GitHub issue on the stan-math repo. If I understand correctly, we want to overload hmm_marginal so that users can pass an array of transition matrices, so an object

matrix[2, 2] theta[K]

where K is the number of hidden states, right? Weâ€™ll also want to extend hmm_latent_rng and hmm_hidden_state_prob.

I donâ€™t think we got around testing the performance of hmm_marginal, and the other hmm routines (beyond unit tests). Iâ€™d be interested to hear feedback from users.

From my understanding (which may be wrong, as Iâ€™m pretty new to HMMâ€™s!) weâ€™re wanting to pass an array which looks something like

matrix[K,K] theta[T]

where K is the number of hidden states and T is the number of time steps, with the values of theta at each timestep depending on one or more covariates.

Thereâ€™s also the issue of having multiple time series (in my case camera trap sites) which are generally not the same length (camera traps may end up being deployed for different numbers of days, especially when in remote areas which take time to access) - Iâ€™m guessing that this means we actually need an array which contains a separate transition matrix for each site/time combination (as this is what I did in R when simulating data to test the model on), but Iâ€™m not certain.

That makes sense to me â€“ for a time varying transition matrix, in general you need to store a whole state transition matrix for each time step T.

Makes sense. The HMM for each site contributes a term to the overall log-probability. So perhaps in Stan you could have an outer loop over sites, then inside that loop perform the HMM forward calculation with all the site-specific data such as observation sequence, covariate sequence, site-dependent transition matrices.

To encode the site-specific data (sequences of observations, for example) into the value types offered by Stan, one technique could be to pack them all into a single array (with respect to some ordering of the sites), then use an auxiliary array for book-keeping to store the lengths of each subarray, so you can figure out the range of indices of each site-specific subarray at runtime and avoid adding spurious transitions between observations made in different sites.

Yes. @betanalpha , @vianeylb and I wrote the hmm routines in Stan (with @Bob_Carpenter reviewing the PR). This is it: youâ€™re talking to the folks developing the next iteration of the feature. We just never got to it.

Letâ€™s use the notation in the reference manual. By â€śnumber of hidden statesâ€ť, I meant number of unobserved discrete variables, rather than the cardinality of the hidden state space. I realize I misread your code and didnâ€™t use the correct notation. Weâ€™ll go with what you have and use N instead of T, and \Gamma instead of \theta, to be consistent with the manual. So we want to pass in

matrix[K, K] Gamma[N]

The other arguments and the outcome of hmm_marginal, hmm_latent_rng and hmm_hidden_state_prob remain unchanged. @vianeylb can you confirm? Iâ€™ll then write a GitHub issue.

A few comments: the original prototype code was written to allow for varying transition matrices but we implemented the simpler functionality in Stan first. Extending the current implementation to varying transition matrices is actually straightforward because the Jacobian-adjoint products are compromised of products of the form \boldsymbol{\Gamma} \cdot \boldsymbol{\alpha}_{n} and the only change needed is replacing each \boldsymbol{\Gamma} with \boldsymbol{\Gamma}_{n}.

Introducing the intermediate matrix[K, K] Gamma[N] into a program may introduce substantial memory/performance burden, especially if K and/or N is large, but thatâ€™s going to be unavoidable for a general HMM implementation.

In general yes each time series will require a separate HMM. That said there will be numerous ways to implement these multiple HMMs as efficiently as possible, for example defining matrix[K, K] Gamma[N] that covers all days once and then subsetting the array when calling each individual HMM (presuming that the transition matrices are the same different sites).

Hi @charlesm93 - thanks for the reply. Just wanted to check, how difficult/feasible would it be to also allow sparse transition matrices in the hmm_marginal and the hmm_hidden_state_prob functions? I was wondering if this would provide efficiency in cases where the state space is massive but most transitions are impossible (my use case is one where the only possible transitions are from t->t and t->t+1)

There is a path to doing this in C++, using Eigenâ€™s support for sparse matrices, but whatâ€™s not clear to me is whether the automatic differentiation will immediately take advantage of the sparsity or whether this is something we need to configure. Another question lies in how do we make this available in the Stan language. We need to come up with an API that letâ€™s users specify sparse matrices, while enabling operations on these matrices. This would mean supporting sparse matrices in general.

I know a few devs have looked into it, notably @stevebronder.

In your case, it sounds like the transition matrix is block diagonal. In theory, I could add an argument that allows you specify the size of the block and then does operations on block diagonal matrices, but this seems too specific a case to warrant adding a new feature.