Hidden markov modelling the derivative of a latent function

mike-lawrence · April 20, 2021, 7:09pm

I’ve been struggling with modelling simple periodic functions, discovering that the likelihood topography is massively multimodal in very unintuitive ways. But recently I had the idea to reframe things with a hidden markov model applied to the slope of the function derivative. I’ve had some seeming success with the model below, but I’m having trouble working out how to take the successful model of the derivative and integrate to get back to the original function.

In fact, ultimately I’ll be looking to do a whole lot more modelling of the observations, including things like the observations being the additive combination of a known number of periodic functions with well-separated frequencies, a covariate that additionally influences the observations, etc. So I’m thinking that maybe I’m going to have to abandon use of hmm_marginal anyway, and maybe that’s what I need to do here to do inference on the integrated function?

Here’s an R script to play. It’ll produce data that looks like (dots are data, line is true function):

And it’ll show that the model does a great job of classifying the slopes:

But my hamfisted attempts to work back to the original function are clearly not working:

And here’s the model itself:

// derivative_hmm.stan
data{
	int n ;
	vector[n] dy ; // derivative (lag-1 difference) of y
	real y0 ; // zeroth y observation (for attempting to align f)
}
parameters{
	real<lower=0> noise ; //measurement noise
	real<lower=0> pos_slope ; // slope for positive slope state
	real<upper=0> neg_slope ; // slope for negative slope state
	simplex[2] p0 ; // initial state probabilities
	simplex[2] tp_pn ; // transition probability for positive to negative
	simplex[2] tp_np ; // transition probability for negative to positive
}
transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np));
	matrix[2, n] state_ll ;
	// Compute the log likelihoods in each possible state
	for(i_n in 1:n) {
		state_ll[1, i_n] = normal_lpdf(dy[i_n] | pos_slope, noise);
		state_ll[2, i_n] = normal_lpdf(dy[i_n] | neg_slope, noise);
	}
}
model{
	noise ~ std_normal() ;
	pos_slope ~ std_normal() ;
	neg_slope ~ std_normal() ;
	target += hmm_marginal(state_ll, tps, p0);
}
generated quantities{
	// get the probability of a positive slope at each timepoint
	row_vector[n] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,];
	// compute the most probable slope at each timepoint
	// vector[n] df = p_pos > .5 ? pos_slope : neg_slope ; // doesn't work; conditionals aren't vectorized
	vector[n] df ;
	for(i_n in 1:n){
		if(p_pos[i_n]>.5){
			df[i_n] = pos_slope ;
		}else{
			df[i_n] = neg_slope ;
		}
	}
	// compute the forward and backward integrals
	vector[n] fi ;
	vector[n] bi ;
	for(i_n in 1:n){
		fi[i_n] = sum(df[1:i_n]);
		bi[n-i_n+1] = sum(-df[(n-i_n+1):n]);
	}
	// average
	vector[n] f = (fi+bi)/2 + y0 ;
}

mike-lawrence · April 20, 2021, 8:29pm

Solved it! Just had to make f a parameter and express its derivative in the hmm part:

data{
	int n ;
	vector[n] y ;
}
parameters{
	simplex[2] p0 ; // initial state probabilities
	simplex[2] tp_pn ; // transition probability for positive to negative
	simplex[2] tp_np ; // transition probability for negative to positive
	real<lower=0> pos_slope ; // slope for positive slope state
	real<upper=0> neg_slope ; // slope for negative slope state
	real<lower=0> noise ; //measurement noise
	// real intercept ;
	vector[n] f ;
}
transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
	matrix[2, n-1] state_ll ;
	// Compute the log likelihoods in each possible state
	for(i_n in 2:n) {
		state_ll[1,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | pos_slope, noise) ;
		state_ll[2,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | neg_slope, noise) ;
	}
}
model{
	noise ~ std_normal() ;
	pos_slope ~ std_normal() ;
	neg_slope ~ std_normal() ;
	// intercept ~ std_normal() ;
	f ~ std_normal() ;
	y ~ normal(f,noise) ;
	target += hmm_marginal(state_ll, tps, p0) ;
}
generated quantities{
	// get the probability of a positive slope at each timepoint
	row_vector[n-1] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,] ;
}

mike-lawrence · April 20, 2021, 8:36pm

Oh wait, noise probably shouldn’t be involved in this part, but I don’t know what should be there:

for(i_n in 2:n) {
  state_ll[1,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | pos_slope, noise) ;
  state_ll[2,i_n-1] = normal_lpdf((f[i_n]-f[i_n-1]) | neg_slope, noise) ;
}

Koalha · April 21, 2021, 7:06am

Maybe sqrt(square(noise) + square(noise))?

EDIT: on second thought no, it should be the variance of the mean function.

EDIT2: actually I have no idea, forget I said anything.

spinkney · April 21, 2021, 10:23am

I think adding that noise parameter over fits:

Instead of using a normal, I think it is better to make a categorical variable using y. I’ve made a noisy measure of the “hidden state” by taking the difference in y and getting the sign.

functions {
  int sign(real x){
        if (x < 0.0)
            return 0;
        else
            return 1;
    }
}
data{
	int n ;
	vector[n] y ;
}
transformed data {
  vector[n - 1] y_diff;
  
  for(i_n in 2:n) 
	  y_diff[i_n - 1] = sign( y[i_n] - y[i_n - 1] );
}

In the parameters block I just condensed the pos_slope and neg_slope into one parameter alpha

... 
real alpha[2] ; // slope state
...

The rest is taken from this blog where there’s a transformation of the transition probabilities using a 0.5 threshold. These are marginalized over the y_diff states.

transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
	matrix[2, n-1] state_ll ;
	real<lower=0, upper=1> theta[2]; // Emission prob    
	// Compute the log likelihoods in each possible state
	theta[1] = 0.5 + (1 - 0.5) * inv_logit(alpha[1]); // Greater than 50%
    theta[2] = 0.5 * inv_logit(alpha[2]); // Less than 50% 
	
	for(i_n in 2:n) {
	  for (s in 1:2)
		  state_ll[s, i_n-1] = log(y_diff[i_n - 1] * theta[s] + (1 - y_diff[i_n - 1]) * (1 - theta[s]));
	}
}

The final graphs

the entire stan program

functions {
  int sign(real x){
        if (x < 0.0)
            return 0;
        else
            return 1;
    }
}
data{
	int n ;
	vector[n] y ;
}
transformed data {
  vector[n - 1] y_diff;
  
  for(i_n in 2:n) 
	  y_diff[i_n - 1] = sign( y[i_n] - y[i_n - 1] );
}
parameters{
	simplex[2] p0 ; // initial state probabilities
	simplex[2] tp_pn ; // transition probability for positive to negative
	simplex[2] tp_np ; // transition probability for negative to positive
	real alpha[2] ; // slope for positive slope state
	real<lower=0> noise ; //measurement noise
	vector[n] f ;
}
transformed parameters {
	matrix[2, 2] tps = transpose(append_col(tp_pn,tp_np)) ;
	matrix[2, n-1] state_ll ;
	real<lower=0, upper=1> theta[2]; // Emission prob    
	// Compute the log likelihoods in each possible state
	theta[1] = 0.5 + (1 - 0.5) * inv_logit(alpha[1]); // Greater than 50%
  theta[2] = 0.5 * inv_logit(alpha[2]); // Less than 50% 
	
	for(i_n in 2:n) {
	  for (s in 1:2)
		  state_ll[s, i_n-1] = log(y_diff[i_n - 1] * theta[s] + (1 - y_diff[i_n - 1]) * (1 - theta[s]));
	}
}
model{
	noise ~ std_normal() ;
	alpha ~ std_normal() ;
	f ~ std_normal() ;
	y ~ normal(f, noise) ;
	target += hmm_marginal(state_ll, tps, p0) ;
}
generated quantities{
	// get the probability of a positive slope at each timepoint
	row_vector[n-1] p_pos = hmm_hidden_state_prob(state_ll, tps, p0)[1,] ;
}

mike-lawrence · April 21, 2021, 1:45pm

Ha, I tried precisely the same thing last night. Or at least with y~normal(f,noise) then dy~hmm(..., sqrt(2*pow(noise,2))), but that ended up functionally making f mostly just tracking y rather being a smooth latent function.

mike-lawrence · April 21, 2021, 1:47pm

Thanks! Diving into your solution now…

mike-lawrence · April 21, 2021, 2:36pm

Hm, I think with your formulation, inference on f is indistinguisable from not having any of the HMM structure in there at all. Indeed, commenting out all that structure and re-running yields pretty much identical posteriors on f. But your trick of getting the sign of the difference and doing the HMM on that is inspiring. Unfortunately I’ve consumed my work-allotted time for chasing this idea, but I’ll hopefully make time to return to it soon.

spinkney · April 22, 2021, 1:28pm

yea I ran it both ways and noticed the output was the same. I know that you were just making a contrived example. Does having the HMM and what I’ve done sample better on your actual problem where not using the HMM doesn’t sample well?

mike-lawrence · April 22, 2021, 2:35pm

Unfortunately my real data is much noisier, hence the desire to be working with the latent variable f in the HMM part.

I actually came up with an alternative idea that I’ll post as a separate thread in a bit and link back here in case you’re interested. Still only 10% worked out, but “feels” like a less kludgey reformulation of inference for periodic latents than this one.

mike-lawrence · April 22, 2021, 4:47pm

Related post here: Mathy-folks: thoughts on the asymptotic representation of this k-HMM model for periodic signals?

Topic		Replies	Views
Multivariate Hidden Markov Model Modeling rstan	6	1568	January 15, 2021
Hidden Markov Model with constraints Modeling specification , hmm	6	2669	August 24, 2017
Vectorization in HMM model specification Modeling specification	2	186	February 26, 2024
Identifiability and convergence (Input-Output Hidden Markov Model) Modeling hmm	3	1934	July 23, 2017
Examples for hidden Markov models with longitudinal data General hmm	8	3092	May 6, 2020

Hidden markov modelling the derivative of a latent function

Related topics