I am trying to estimate the parameters HMM with hidden states and time-varying transition probabilities. The model describes forecasting behaviour, agents use different rule (state) and can switch between the rules based on their most recent past performance (squared loss). The switch is governed by softmax function with parameter \beta - attentiveness to the past performance. If \beta=0 they switch randomly. One may introduce some inertia to model tendency to stay in the most recent state irrespectively of its performance. This is another state-transition parameter.

The rules are state-dependent, e.g., strong past price extrapolation, weak past price exptrapolation, so the states have some meaning avoiding the label switching issue.

To summaries there are two sets of parameters:

Parameters of the prediction process given the state

Parameters of state-transition process (\beta and inertia).

I simulate the model in R with some know parameters. The Stan model converges and generally estimates the posterior means of the parameters of the rules close to the true parameters but is way to far from the actual parameters when it comes to state transition parameters (especially \beta).

What might be the issue here?

I know that Stan uses only forward probabilities to compute the log-likelihood. There is also a Baumâ€“Welch algorithm which uses both forward and backward probabilies. It seems that the Baumâ€“Welch algorithm is more robust to just using forward probabilities. Can the issue I am experiencing be due to the use of just forward probabilities for the parameter estimation? I know there is a code in Stan which used foward-backward probabilities to recover the hidden state probabilities, but it still uses only the forward part to compute the parameters.

I have a code but it is rather long. Perhaps, at least initially I can get some advice without getting into the code. I am happy to provide it if it help though.

@vianeylb is our resident expert, but in case sheâ€™s not available to answer Iâ€™ll take a crack:

The Baum-Welch algorithm is not a more robust version of the forward algorithm; it computes something fundamentally different (the maximum likelihood estimate for the parameters rather than the likelihood conditional on the parameters). In general, the utility of combining forward and backward passes arises in cases where you want to understand something about the likely hidden state probabilities conditional on the observed emission probabilities, as you mention. But thatâ€™s not relevant when the only goal is computing the likelihood conditional on the modeled transition probabilities. As long as itâ€™s not underflowing (and it sounds as though itâ€™s not), the forward algorithm is not an approximation; it computes the likelihood exactly. There is no additional information contained in the backward probabilities that could improve the computation.

I can think of roughly three plausible possibilities for what is going awry for you.

My best guess is that there is an undetected mismatch between your data simulation and your Stan implementation (this sort of thing happens to me all the time). If you can provide a reproducible example (ideally with well commented simulation code) we could take a look.

My next guess is that there might be some hidden non-identifiability or multi-modality that the sampler hasnâ€™t effectively detected or flagged. Again, seeing the model spelled out would help troubleshoot.

My third guess is that there might be numerical problems in the forward algorithm, but the fact that Stan is running without warnings and is returning reasonable estimates for some parameters suggests that this isnâ€™t the culprit.

One more thing:
To the best of my knowledge, Stanâ€™s built-in hmm_marginal function only works with time-invariant transition probabilities. Your title suggests that you are working with time-varying transition probabilities, though I donâ€™t quite grasp how time-varying transition probabilities enter into your description of the problem.

When you say

Iâ€™m led to believe that you are working with hmm_marginal and not your own implementation of the forward algorithm. If thatâ€™s true, and if itâ€™s true that you intend to work with time-varying transition probabilities, then I think something unintentional must be happening such that you are not passing the time-varying transition probabilities that you want. It is not too difficult to write reasonably efficient implementations of the forward algorithm in Stan code without using hmm_marginal, and to set these up to accept time-varying transition matrices. Some resources linked here: