You’d write that model in the usual way for a hierarchical model.
You can set HMMs up to always start in the same state. There are a lot of ways HMMs get formulated; usually they don’t assume separate initial distributions, but sometimes they do. The chapter on missing data in the manual shows how to build data structures out of data (your zeroes) and parameters, but it’d be more efficient here to just code everything directly (though much less portable, so I’d advise just mixing the transition matrix). Then you only have three degrees of freedom. once you set a, c, e in [0, 1], b = 1 - a, d = 1- c, and f = 1 - e. I’d parameterize it that way if you know that’s the only model you’re going to use rather than trying to play with putting simplexes together.
All in, this’d parameterize your transition matrix as:
parameters {
real<lower=0, upper=1> a;
real<lower=0, upper=1> b;
real<lower=0, upper=1> c;
}
transformed parameters {
simplex[4] theta[4]
= { [a, 1 - a, 0, 0]',
[0, b, 1 - b, 0]',
[0, 0, c, 1 -c]',
[0, 0, 0, 1]' };
}
You could of course generalize this banded structure in a loop to arbitrary sizes.
We have a set of prior recommendations in Wiki form and we strongly discourage interval-bounded uniform priors (if you do insist on using them, you don’t need the sampling statements).
To force the first state, you just literally write the first transition step differently than the others. Playing around with infinities and negative infinities is very hard on the chain rule for autodiff.