Hidden markov modelling the derivative of a latent function

I’ve been struggling with modelling simple periodic functions, discovering that the likelihood topography is massively multimodal in very unintuitive ways. But recently I had the idea to reframe things with a hidden markov model applied to the slope of the function derivative. I’ve had some seeming success with the model below, but I’m having trouble working out how to take the successful model of the derivative and integrate to get back to the original function.

In fact, ultimately I’ll be looking to do a whole lot more modelling of the observations, including things like the observations being the additive combination of a known number of periodic functions with well-separated frequencies, a covariate that additionally influences the observations, etc. So I’m thinking that maybe I’m going to have to abandon use of hmm_marginal anyway, and maybe that’s what I need to do here to do inference on the integrated function?

Here’s an R script to play. It’ll produce data that looks like (dots are data, line is true function):

And it’ll show that the model does a great job of classifying the slopes:

But my hamfisted attempts to work back to the original function are clearly not working:

And here’s the model itself:

// derivative_hmm.stan
data{
	int n ;
	vector[n] dy ; // derivative (lag-1 difference) of y
	real y0 ; // zeroth y observation (for attempting to align f)
}
parameters{
	real<lower=0> noise ; //measurement noise
	real<lower=0> pos_slope ; // slope for positive slope state
	real<upper=0> neg_slope ; // slope for negative slope state
	simplex[2] p0 ; // initial state probabilities
	simplex[2] tp_pn ; // transition probability for positive to negative
	simplex[2] tp_np ; // transition probability for negative to positive
}
transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np));
	matrix[2, n] state_ll ;
	// Compute the log likelihoods in each possible state
	for(i_n in 1:n) {
		state_ll[1, i_n] = normal_lpdf(dy[i_n] | pos_slope, noise);
		state_ll[2, i_n] = normal_lpdf(dy[i_n] | neg_slope, noise);
	}
}
model{
	noise ~ std_normal() ;
	pos_slope ~ std_normal() ;
	neg_slope ~ std_normal() ;
	target += hmm_marginal(state_ll, tps, p0);
}
generated quantities{
	// get the probability of a positive slope at each timepoint
	row_vector[n] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,];
	// compute the most probable slope at each timepoint
	// vector[n] df = p_pos > .5 ? pos_slope : neg_slope ; // doesn't work; conditionals aren't vectorized
	vector[n] df ;
	for(i_n in 1:n){
		if(p_pos[i_n]>.5){
			df[i_n] = pos_slope ;
		}else{
			df[i_n] = neg_slope ;
		}
	}
	// compute the forward and backward integrals
	vector[n] fi ;
	vector[n] bi ;
	for(i_n in 1:n){
		fi[i_n] = sum(df[1:i_n]);
		bi[n-i_n+1] = sum(-df[(n-i_n+1):n]);
	}
	// average
	vector[n] f = (fi+bi)/2 + y0 ;
}
1 Like

Solved it! Just had to make f a parameter and express its derivative in the hmm part:

data{
	int n ;
	vector[n] y ;
}
parameters{
	simplex[2] p0 ; // initial state probabilities
	simplex[2] tp_pn ; // transition probability for positive to negative
	simplex[2] tp_np ; // transition probability for negative to positive
	real<lower=0> pos_slope ; // slope for positive slope state
	real<upper=0> neg_slope ; // slope for negative slope state
	real<lower=0> noise ; //measurement noise
	// real intercept ;
	vector[n] f ;
}
transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
	matrix[2, n-1] state_ll ;
	// Compute the log likelihoods in each possible state
	for(i_n in 2:n) {
		state_ll[1,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | pos_slope, noise) ;
		state_ll[2,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | neg_slope, noise) ;
	}
}
model{
	noise ~ std_normal() ;
	pos_slope ~ std_normal() ;
	neg_slope ~ std_normal() ;
	// intercept ~ std_normal() ;
	f ~ std_normal() ;
	y ~ normal(f,noise) ;
	target += hmm_marginal(state_ll, tps, p0) ;
}
generated quantities{
	// get the probability of a positive slope at each timepoint
	row_vector[n-1] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,] ;
}

Oh wait, noise probably shouldn’t be involved in this part, but I don’t know what should be there:

for(i_n in 2:n) {
  state_ll[1,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | pos_slope, noise) ;
  state_ll[2,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | neg_slope, noise) ;
}

Maybe sqrt(square(noise) + square(noise))?

EDIT: on second thought no, it should be the variance of the mean function.

EDIT2: actually I have no idea, forget I said anything.

1 Like

I think adding that noise parameter over fits:

Instead of using a normal, I think it is better to make a categorical variable using y. I’ve made a noisy measure of the “hidden state” by taking the difference in y and getting the sign.

functions {
  int sign(real x){
        if (x < 0.0)
            return 0;
        else
            return 1;
    }
}
data{
	int n ;
	vector[n] y ;
}
transformed data {
  vector[n - 1] y_diff;
  
  for(i_n in 2:n) 
	  y_diff[i_n - 1] = sign( y[i_n] - y[i_n - 1] );
}

In the parameters block I just condensed the pos_slope and neg_slope into one parameter alpha

... 
real alpha[2] ; // slope state
...

The rest is taken from this blog where there’s a transformation of the transition probabilities using a 0.5 threshold. These are marginalized over the y_diff states.

transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
	matrix[2, n-1] state_ll ;
	real<lower=0, upper=1> theta[2]; // Emission prob    
	// Compute the log likelihoods in each possible state
	theta[1] = 0.5 + (1 - 0.5) * inv_logit(alpha[1]); // Greater than 50%
    theta[2] = 0.5 * inv_logit(alpha[2]); // Less than 50% 
	
	for(i_n in 2:n) {
	  for (s in 1:2)
		  state_ll[s, i_n-1] = log(y_diff[i_n - 1] * theta[s] + (1 - y_diff[i_n - 1]) * (1 - theta[s]));
	}
}

The final graphs


the entire stan program

functions {
  int sign(real x){
        if (x < 0.0)
            return 0;
        else
            return 1;
    }
}
data{
	int n ;
	vector[n] y ;
}
transformed data {
  vector[n - 1] y_diff;
  
  for(i_n in 2:n) 
	  y_diff[i_n - 1] = sign( y[i_n] - y[i_n - 1] );
}
parameters{
	simplex[2] p0 ; // initial state probabilities
	simplex[2] tp_pn ; // transition probability for positive to negative
	simplex[2] tp_np ; // transition probability for negative to positive
	real alpha[2] ; // slope for positive slope state
	real<lower=0> noise ; //measurement noise
	vector[n] f ;
}
transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
	matrix[2, n-1] state_ll ;
	real<lower=0, upper=1> theta[2]; // Emission prob    
	// Compute the log likelihoods in each possible state
	theta[1] = 0.5 + (1 - 0.5) * inv_logit(alpha[1]); // Greater than 50%
  theta[2] = 0.5 * inv_logit(alpha[2]); // Less than 50% 
	
	for(i_n in 2:n) {
	  for (s in 1:2)
		  state_ll[s, i_n-1] = log(y_diff[i_n - 1] * theta[s] + (1 - y_diff[i_n - 1]) * (1 - theta[s]));
	}
}
model{
	noise ~ std_normal() ;
	alpha ~ std_normal() ;
	f ~ std_normal() ;
	y ~ normal(f, noise) ;
	target += hmm_marginal(state_ll, tps, p0) ;
}
generated quantities{
	// get the probability of a positive slope at each timepoint
	row_vector[n-1] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,] ;
}
3 Likes

Ha, I tried precisely the same thing last night. Or at least with y~normal(f,noise) then dy~hmm(..., sqrt(2*pow(noise,2))), but that ended up functionally making f mostly just tracking y rather being a smooth latent function.

Thanks! Diving into your solution now…

Hm, I think with your formulation, inference on f is indistinguisable from not having any of the HMM structure in there at all. Indeed, commenting out all that structure and re-running yields pretty much identical posteriors on f. But your trick of getting the sign of the difference and doing the HMM on that is inspiring. Unfortunately I’ve consumed my work-allotted time for chasing this idea, but I’ll hopefully make time to return to it soon.

1 Like

yea I ran it both ways and noticed the output was the same. I know that you were just making a contrived example. Does having the HMM and what I’ve done sample better on your actual problem where not using the HMM doesn’t sample well?

Unfortunately my real data is much noisier, hence the desire to be working with the latent variable f in the HMM part.

I actually came up with an alternative idea that I’ll post as a separate thread in a bit and link back here in case you’re interested. Still only 10% worked out, but “feels” like a less kludgey reformulation of inference for periodic latents than this one.

Related post here: Mathy-folks: thoughts on the asymptotic representation of this k-HMM model for periodic signals?