How to specify a stochastic state space model with discrete (HMM-like) output

I am looking into a (deterministic) state-space model where the state is a function of two continuous processes and a discrete one - this last one is either “on” or “off”.

For two continuous processes, the first is a simple sine-like oscillating function C(t) = a\ sin(\omega(t-\alpha)), and the second is either exponentially decreasing or a saturating increase, depending on the discrete state:

H(t) = \begin{cases} H_0\ e^{(t-t_0)/\chi_S}, & \text{if on} \\ \mu + (H_0 - \mu)\ e^{(t-t_0)/\chi_W}, & \text{if off} \end{cases}

State switching happens when the difference H(t) - C(t) reaches a lower or upper threshold. This is a classic conceptual model, formalized in several papers, including this (PLoS) one.

In the context of inference, under one particular experimental set up, we can’t observe the continuous processes, only the “on” state through activity counts (which for know I’ll assume is Poisson distributed with a constant parameter), and the “off” state always produces zero counts. So a forward simulation of this looks like this:

While I am not sure results will be great, implementing this deterministic model in Stan should be straightforward. Real data is rarely this regular, though, so I’m hoping to implement a version where the system switches stochastically between the on and off states. However, this model doesn’t seem to fall into a clear category, so it’s not clear to me what the best approach would be. Some options (and drawbacks) are:

  1. Fully stochastic state, Particle Filter approach (may be difficult with observations of discrete process only);
  2. Hidden Markov Model-like approach (but transition rates depend on state, so may not be feasible);
  3. Deterministic, but estimating switch times as additional parameters (not always clear how many switch points are there);
  4. Forget counts, and use the durations of on/off periods as observations to fit the model.

Ideally this would already fit into some class of model, but otherwise I am wondering if there’s some reasoning on what the best approach may be. Thank you.


Looks tricky! The only thing I feel at least somewhat qualified to comment on is the HMM part, but since nobody else answered, I will give it a try:

That’s actually not a big limitation - you can have time-varying transition probabilities in a HMM, all you AFAIK need for the forward algorithm to work is to be able to construct the transition probabilities matrix at any discrete time point you need (see e.g. Are time-varying transition probabilities supported by the new hmm_* functions?).

So I think a non-terrible HMM approximation to the process would be constructed by discretizing H into K bins with bounds b_0, ... b_K, while keeping C continuous. This would turn the deterministic continous dynamics of H into stochastic discrete dynamics over \hat H and V \in \{ \mathrm{on, off}\} - for each bin of H values at discrete time i you would have transition probabilities P(\hat H_i = r, V_i = u | \hat H_{i - 1} = s, V_{i-1} = v) equal / approximate the proportion of points in the cube i - 0.5 < t_1 < i + 0.5, i + 0.5 < t_2 < i + 1.5, b_{r - 1} < h < b_{r} such that the continous dynamics starting at time t_1 with H(t_1) = h will lead to b_{s - 1} < H(t_2) < b_{s}. I.e. we assume that within each bin of time and \hat H we believe in uniform distribution of the continuous values. Even if the integrals would be analytically tractable (which they may not be), one would still probably want to allow some wiggle room so that whenever exact simulation of the dynamics give non-zero probability to a bin, you also give small non-zero probability to its neighbours to allow the HMM to not be completely thrown off the rails by any artifacts of the simulation.

One could probably achieve a good approximation by just simulating the trajectory from the midpoint of each bin and then look at the probabilities that the result of this exact simulation + some noise term falls into each bin. (one could then estimate the noise term as part of the fitting process).

You could then play with the number of bins K to tradeoff (in)efficiency for fidelity. But I admit that solving the dynamics K^2 times for each timestep might be very costly even for very low K, especially if analytical formulations for the crossing points of H and C are not available and you would have to use numerical solvers.

Hope that helps at least a bit!


Thanks. Even some acknowledgement that it is indeed tricky is already helpful; I was afraid that there’s some classic approach to do this and that I may be trying to reinvent the wheel (I guess that’s still possible, but at least it’s not that obvious).
I am more experienced in general state-space models, so my choice would be option number 1, the problem there being that a binary state doesn’t really lend itself to being tracked by a sequential sampling of the state probabilities. Eventually we may be able to collect data on processes H and C, and I’d probably go in that direction, but for now I think it may not be a good option and option 2 seems more feasible.

I have seen examples of time-varying transition probabilities in HMMs, but in this case it’s more complex than that since the probabilities are also state dependent, and switching states would also “reset” transition rates. I am not aware of an HMM implementation that does that, but I thought it might be possible – your proposal is an attempt (or maybe workaround) to achieve that.

If I understand correctly that would be a sort of two-state HMM (in the sense that x_t has two components, each of them could have two or more states) with a time-dependent transition rate given by C(t). I think this discretization may simplify things enough to make it feasible, at the cost of increasing the number of hidden states and not using the dependence between processes to reduce computations. I have to think more about it.

Hopefully someone else could be able to bring up an alternative from options 3 and 4 as well. Thanks again.

1 Like

If it was me I would probably start with the first option and some kind of logistic hack to make a pseudo binary state. Just because it seems easiest to code ;)

Yes, That was the first direction I considered. I thought about assigning some constant-but-low probability to each possibility in the binary compoment, to make sure there was always some uncertainty about it to be propagated to the next step. I thought that may be too hacky, though, and that there could be a proper way of doing it instead. Maybe there isn’t.

I don’t think I can follow your thoughts here - maybe you have in mind some more complex version of the model than the simple one you presented? In HMMs transition probabilities are always state dependent and switching states resets the transition rates by definition (that’s the Markov property), so I don’t really see a problem here.

Yes, but in the end it would likely be most easy to implement as a single-variable HMM with 2K states.

I however think the bigger problem here is likely that the model as given is unlikely to have its parameters informed by data in several ways:

  • as a gets small relative to the distance between thresholds for switching, changing \omega and \alpha will IMHO not change the dynamics in a noticeable way
  • in the picture you’ve shown, I think doubling \omega would lead to almost the same dynamics. More generally, there will always be many sinusoids that cross a given finite set of points, hence multimodality in posterior is highly likely unles C is given or can be very tightly constrained a priori.
  • it looks like at least for some configuration the dynamics of H would be well approximated (and thus indistinguishable from) a linear increase/decrease. But whenver that’s the case, there will likely be many vaslty different combinations of your parameters that lead to almost the same approximately linear increase/decrase.

In fact even fitting a simple noisy sinusoid is a tricky problem (see Ideas for modelling a periodic timeseries for some fun discussion on the topic - it is possible we all missed something basic, but all the “solutions” were quite involved. It is possible the marginalization implied by HMM would help, but I am not completely sure.

Of course you’re right, and maybe I wasn’t clear. Let me restate it.

In an HMM the probability of each transition at some step is dependent on the current systems state, x_t, and the transition matrix can be time-dependent as you mentioned, but the probabilities in the transition matrix itself do not depend on the state, only on time explicitly, Q(t). Another way of putting is p(x_{t+1}|x_t) = f(t) for any given state; when the state switches you don’t really reset any rates, just switch to a different row of Q.

In this model the probability of switching state (V \in \{0,1\}) depends not on t explicitly, but on the state of the system, so p(x_{t+1}|x_t) = g(H(t), C(t)). What I mean by resetting the rates is that – because H(t) depends on V – upon switching we need to take into account the state (\ H(t), C(t), V\ ) to start computing what the rates are from then on. In a deterministic model that’s not an issue, because we can know when the systems will switch, but in a stochastic model we need to take into account when that will happen and keep track of the state of the system at any point. That’s why I think we cannot fit this into an HMM framework as is.

I believe what you propose is equivalent to having only the process C(t) as governing the rates and treat the rest as a discrete stochastic state, because that would allow a transition matrix to be a function of t alone, and the rest would be equivalent to the x_t state, so it would become possible to treat it as a HMM, with the caveats we have discussed.

About this second part, I agree completely. I am not sure I’d agree that fitting oscillations is as hard as that post could suggest, but there are definitely some particularities, especially when it comes to phase parameters. However, this is the hard reality of experimental science: sometimes we are limited to the kind of data that is possible to collect and the theory that is best accepted in the field, my attempt here is to see how far we could get by using inference to have one inform the other. As I mentioned we should be able to generate data on the underlying processes (though it will also make the model more complex), but that requires moving from collecting routine activity data to large scale designs with expensive and time-consuming molecular techniques. So another goal is to assess how much we really gain from this large experimental leap.