# Hidden markov modelling the derivative of a latent function

I’ve been struggling with modelling simple periodic functions, discovering that the likelihood topography is massively multimodal in very unintuitive ways. But recently I had the idea to reframe things with a hidden markov model applied to the slope of the function derivative. I’ve had some seeming success with the model below, but I’m having trouble working out how to take the successful model of the derivative and integrate to get back to the original function.

In fact, ultimately I’ll be looking to do a whole lot more modelling of the observations, including things like the observations being the additive combination of a known number of periodic functions with well-separated frequencies, a covariate that additionally influences the observations, etc. So I’m thinking that maybe I’m going to have to abandon use of `hmm_marginal` anyway, and maybe that’s what I need to do here to do inference on the integrated function?

Here’s an R script to play. It’ll produce data that looks like (dots are data, line is true function):

And it’ll show that the model does a great job of classifying the slopes:

But my hamfisted attempts to work back to the original function are clearly not working:

And here’s the model itself:

``````// derivative_hmm.stan
data{
int n ;
vector[n] dy ; // derivative (lag-1 difference) of y
real y0 ; // zeroth y observation (for attempting to align f)
}
parameters{
real<lower=0> noise ; //measurement noise
real<lower=0> pos_slope ; // slope for positive slope state
real<upper=0> neg_slope ; // slope for negative slope state
simplex p0 ; // initial state probabilities
simplex tp_pn ; // transition probability for positive to negative
simplex tp_np ; // transition probability for negative to positive
}
transformed parameters {
matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np));
matrix[2, n] state_ll ;
// Compute the log likelihoods in each possible state
for(i_n in 1:n) {
state_ll[1, i_n] = normal_lpdf(dy[i_n] | pos_slope, noise);
state_ll[2, i_n] = normal_lpdf(dy[i_n] | neg_slope, noise);
}
}
model{
noise ~ std_normal() ;
pos_slope ~ std_normal() ;
neg_slope ~ std_normal() ;
target += hmm_marginal(state_ll, tps, p0);
}
generated quantities{
// get the probability of a positive slope at each timepoint
row_vector[n] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,];
// compute the most probable slope at each timepoint
// vector[n] df = p_pos > .5 ? pos_slope : neg_slope ; // doesn't work; conditionals aren't vectorized
vector[n] df ;
for(i_n in 1:n){
if(p_pos[i_n]>.5){
df[i_n] = pos_slope ;
}else{
df[i_n] = neg_slope ;
}
}
// compute the forward and backward integrals
vector[n] fi ;
vector[n] bi ;
for(i_n in 1:n){
fi[i_n] = sum(df[1:i_n]);
bi[n-i_n+1] = sum(-df[(n-i_n+1):n]);
}
// average
vector[n] f = (fi+bi)/2 + y0 ;
}
``````
1 Like

Solved it! Just had to make `f` a parameter and express its derivative in the hmm part:

``````data{
int n ;
vector[n] y ;
}
parameters{
simplex p0 ; // initial state probabilities
simplex tp_pn ; // transition probability for positive to negative
simplex tp_np ; // transition probability for negative to positive
real<lower=0> pos_slope ; // slope for positive slope state
real<upper=0> neg_slope ; // slope for negative slope state
real<lower=0> noise ; //measurement noise
// real intercept ;
vector[n] f ;
}
transformed parameters {
matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
matrix[2, n-1] state_ll ;
// Compute the log likelihoods in each possible state
for(i_n in 2:n) {
state_ll[1,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | pos_slope, noise) ;
state_ll[2,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | neg_slope, noise) ;
}
}
model{
noise ~ std_normal() ;
pos_slope ~ std_normal() ;
neg_slope ~ std_normal() ;
// intercept ~ std_normal() ;
f ~ std_normal() ;
y ~ normal(f,noise) ;
target += hmm_marginal(state_ll, tps, p0) ;
}
generated quantities{
// get the probability of a positive slope at each timepoint
row_vector[n-1] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,] ;
}
``````

Oh wait, `noise` probably shouldn’t be involved in this part, but I don’t know what should be there:

``````for(i_n in 2:n) {
state_ll[1,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | pos_slope, noise) ;
state_ll[2,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | neg_slope, noise) ;
}
``````

Maybe sqrt(square(noise) + square(noise))?

EDIT: on second thought no, it should be the variance of the mean function.

EDIT2: actually I have no idea, forget I said anything.

1 Like

I think adding that `noise` parameter over fits:

Instead of using a normal, I think it is better to make a categorical variable using y. I’ve made a noisy measure of the “hidden state” by taking the difference in y and getting the sign.

``````functions {
int sign(real x){
if (x < 0.0)
return 0;
else
return 1;
}
}
data{
int n ;
vector[n] y ;
}
transformed data {
vector[n - 1] y_diff;

for(i_n in 2:n)
y_diff[i_n - 1] = sign( y[i_n] - y[i_n - 1] );
}
``````

In the parameters block I just condensed the `pos_slope` and `neg_slope` into one parameter `alpha`

``````...
real alpha ; // slope state
...
``````

The rest is taken from this blog where there’s a transformation of the transition probabilities using a 0.5 threshold. These are marginalized over the y_diff states.

``````transformed parameters {
matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
matrix[2, n-1] state_ll ;
real<lower=0, upper=1> theta; // Emission prob
// Compute the log likelihoods in each possible state
theta = 0.5 + (1 - 0.5) * inv_logit(alpha); // Greater than 50%
theta = 0.5 * inv_logit(alpha); // Less than 50%

for(i_n in 2:n) {
for (s in 1:2)
state_ll[s, i_n-1] = log(y_diff[i_n - 1] * theta[s] + (1 - y_diff[i_n - 1]) * (1 - theta[s]));
}
}
``````

The final graphs

the entire stan program

``````functions {
int sign(real x){
if (x < 0.0)
return 0;
else
return 1;
}
}
data{
int n ;
vector[n] y ;
}
transformed data {
vector[n - 1] y_diff;

for(i_n in 2:n)
y_diff[i_n - 1] = sign( y[i_n] - y[i_n - 1] );
}
parameters{
simplex p0 ; // initial state probabilities
simplex tp_pn ; // transition probability for positive to negative
simplex tp_np ; // transition probability for negative to positive
real alpha ; // slope for positive slope state
real<lower=0> noise ; //measurement noise
vector[n] f ;
}
transformed parameters {
matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
matrix[2, n-1] state_ll ;
real<lower=0, upper=1> theta; // Emission prob
// Compute the log likelihoods in each possible state
theta = 0.5 + (1 - 0.5) * inv_logit(alpha); // Greater than 50%
theta = 0.5 * inv_logit(alpha); // Less than 50%

for(i_n in 2:n) {
for (s in 1:2)
state_ll[s, i_n-1] = log(y_diff[i_n - 1] * theta[s] + (1 - y_diff[i_n - 1]) * (1 - theta[s]));
}
}
model{
noise ~ std_normal() ;
alpha ~ std_normal() ;
f ~ std_normal() ;
y ~ normal(f, noise) ;
target += hmm_marginal(state_ll, tps, p0) ;
}
generated quantities{
// get the probability of a positive slope at each timepoint
row_vector[n-1] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,] ;
}
``````
3 Likes

Ha, I tried precisely the same thing last night. Or at least with `y~normal(f,noise)` then `dy~hmm(..., sqrt(2*pow(noise,2)))`, but that ended up functionally making f mostly just tracking y rather being a smooth latent function.

Thanks! Diving into your solution now…

Hm, I think with your formulation, inference on `f` is indistinguisable from not having any of the HMM structure in there at all. Indeed, commenting out all that structure and re-running yields pretty much identical posteriors on f. But your trick of getting the sign of the difference and doing the HMM on that is inspiring. Unfortunately I’ve consumed my work-allotted time for chasing this idea, but I’ll hopefully make time to return to it soon.

1 Like

yea I ran it both ways and noticed the output was the same. I know that you were just making a contrived example. Does having the HMM and what I’ve done sample better on your actual problem where not using the HMM doesn’t sample well?

Unfortunately my real data is much noisier, hence the desire to be working with the latent variable `f` in the HMM part.

I actually came up with an alternative idea that I’ll post as a separate thread in a bit and link back here in case you’re interested. Still only 10% worked out, but “feels” like a less kludgey reformulation of inference for periodic latents than this one.