Thanks to encouragement on this forum to look into HMMs, I’m digging in. (Thanks again, @rfc!)
Ultimately, I’m interested in continuous state spaces, but am taking things a step at a time, so am starting with discrete HMMs. The case study Tagging Basketball Events with HMM in Stan has been super-helpful. (And also 2.6 Hidden Markov models | Stan User’s Guide, which the case study is based on.) But in working it, I’ve got two questions. (Really, I just want to make sure I’m on the right track.)
First, the two questions (summarized; more details below):
- I was able “access” the values of posterior log-probabilties local to the
model
block by recreating them in thegenerated quantities
block. Is there a better way? - The estimated log probabilities decrease over time (as shown in a plot below). I assume (?) this is because I’m computing the probability of y_{1:t}, vs just y_t. So the sample space increases with t, and thus the probability of any particular sequence, y_{1:t} decreases with t. At the “hand waving” level, is this (at least somewhat) correct? (This is more of a conceptual/math question than a Stan question; hope it’s appropriate here!)
Background:
I wanted to get a handle on the response of the HMM model over time, and at each t, vs. just the stationary transition and emission probabilities across t = 1:N, as in the case study. (The case study does use the Viterbi algorithm to estimate the most likely sequence of x at each t. But that’s not my focus.)
So, I wanted to get access to the gamma matrix (of posterior log probabilities for the two states) – that is, gamma[N observations, K states]
in the Tagging Basketball… code.
I tried all sorts of things, but the only way I could get “access” was just to replicate the model code in the generated
quantities block, thus using the sampled parameters mu
and theta
to “recreate” the gamma
values, via my model:
data {
int<lower=0> N;
int<lower=0> K;
real y[N];
}
parameters {
simplex[K] theta[K];
positive_ordered[K] mu;
}
model {
// priors
target+= normal_lpdf(mu[1] | 3, 1);
target+= normal_lpdf(mu[2] | 10, 1);
// forward algorithm
{
real acc[K];
real gamma[N, K];
for (k in 1:K)
gamma[1, k] = normal_lpdf(y[1] | mu[k], 1);
for (t in 2:N) {
for (k in 1:K) {
for (j in 1:K)
acc[j] = gamma[t-1, j] + log(theta[j, k]) + normal_lpdf(y[t] | mu[k], 1);
gamma[t, k] = log_sum_exp(acc);
}
}
target += log_sum_exp(gamma[N]);
}
}
// My added code, so I can access gamma
// Note: I just reused the model block's variable names, acc and gamma
generated quantities {
real acc[K];
real gamma[N, K];
for (k in 1:K)
gamma[1, k] = normal_lpdf(y[1] | mu[k], 1);
for (t in 2:N) {
for (k in 1:K) {
for (j in 1:K)
acc[j] = gamma[t-1, j] + log(theta[j, k]) + normal_lpdf(y[t] | mu[k], 1);
gamma[t, k] = log_sum_exp(acc);
}
}
}
So, question 1: is there some other way get access to these “internal” variables (specifically, gamma), vs. “recreating” them in the generated quantities
block?
And then question 2: why do the posterior probabilities decrease over time? (As noted above, I assume it’s because I’m computing the probability of y_{1:t}, vs just y_t – ?)
Here’s a plot, showing the log probability of the two states, which both decrease exponentially. And which nicely reflects the posterior distribution of the two states at each time step, t. The data (from Tagging Basketball…) is in the bottom panel, and the log-posteriors are in the top panel. (Actually, I’m plotting just 100 gamma “samples.”) It shows how, at each point in time, the probability of the data’s current state is quite a bit larger – which is good!
(One thing that still confuses me about Stan is how it computes and then summarizes normalized probabilities from its samples. So I want to make sure I’m not mixing up unnormalized log probabilities and actual probabilities, by somehow not accounting for normalization and/or its absence!)
If anyone is interested, here’s the code I used to create the above plot, which is a modification of the Tagging Basketball… case study code. Note that I’ve saved the .Stan code above as models/hmm_example_G.stan
. (Sorry, my code is a bit messy!)
library(ggplot2)
library(patchwork)
library(rstan)
hmm_data <- readRDS("data/hmm_example.RDS")
stan_data <- list(N = length(hmm_data$y),
K = 2,
y = hmm_data$y)
hmm_fit <- stan("models/hmm_example_G.stan", data = stan_data, iter = 1e3, chains = 4)
samples <- as.matrix(hmm_fit)
# Remove the non-Gamma columns (including the last column, _lp)
samplesG = samples[, -c(1:8, dim(samples)[2])]
nSims = 100 # number of sims to plot
samplesGTrunc = samplesG[1:nSims, ]
dfTrunc = data.frame( sim=rep(1:nSims, 100*2),
time=rep( rep(seq(1:100), 2), each=nSims ),
state=factor(rep( rep(1:2, each=100), each=nSims )),
logProb=c(samplesGTrunc) )
pltProbs = ggplot( data=dfTrunc, aes(x=time, y=logProb, color=state)) +
geom_point(alpha=0.1) +
ylab("logProb(y)") +
theme_light()
dataDf = data.frame(time=1:100, y=stan_data$y)
pltData = ggplot( data=dataDf, aes(x=time, y=y)) +
geom_point(color="steelblue") +
theme_light()
plt = pltProbs + pltData + plot_layout(ncol=1, heights=c(2, 1))
plot(plt)