Vectorising multiple change point model

Hi! I am trying to implement a hierarchical change-point model in stan. At the heart of the stan implementation is the linear-time example given in the Stan reference (https://mc-stan.org/docs/2_18/stan-users-guide/change-point-section.html)

I have a collection of N time-series of length T which are divided into two classes. I have a list of ints called classes which allows me to label each of the N time-series as an element of one of the two classes. The two classes have different breakpoints but they have the same early and late rates. My transformed parameters block:

matrix[T, k] lp;                                                                                
matrix[T + 1, k] lp_e;                                                                          
matrix[T + 1, k] lp_l;                                                                          
matrix[T + 1, k] prior_s;                                                                       
                                                                                                
for (i in 1:k)                                                                                  
{                                                                                               
    lp_e[1, i] = 0;                                                                             
    lp_l[1, i] = 0;                                                                             
    prior_s[1, i] = normal_lpdf(1 | mu, sigma);                                                 
}                                                                                               
                                                                                                
for (t in 1:T) {                                                                                
    for (i in 1:k) {                                                                            
        prior_s[t + 1, i]  =  normal_lpdf(t | mu, sigma);                                       
        lp_e[t+1, i]       =  lp_e[t, i];                                                       
        lp_l[t+1, i]       =  lp_l[t, i];                                                       
    }                                                                                           
    for (n in 1:N) {                                                                            
        lp_e[t+1, classes[n]] += normal_lpdf(D[n, t] | e, sigma_data);
        lp_l[t+1, classes[n]] += normal_lpdf(D[n, t] | l, sigma_data);
    }                                                                                           
}                                                                                               
                                                                                                
for (i in 1:k) {                                                                                
    lp[:,i] = rep_vector(lp_l[T + 1, i], T )                                                    
                + head(lp_e[:, i], T) - head(lp_l[:, i], T) + prior_s[1:T, i];
}

I suspect the 1…N loop is slowing things down and I’d like to vectorise it. I will eventually have many more than two classes and N will be in the tens of thousands (at least). Any ideas on how to write an autodiff-friendly implementation? Many thanks!

I’m going to move this to the modeling category.

There’s no a good way to vectorize that (n in 1:N) loop, I’m afraid, as it indexes by n into the left-hand side. Our normal_lpdf vectorizations on the other hand reduce by summing, which is the usual use case for densities.

On the other hand, you should be able to vetorize the lp_e and lp_l assignments in the first loop,

for (t in 1:T) {
  lp_e[t + 1] = lp_e[t];
  lp_l[t + 1] = lp_l[t];
}

Is normal_lpdf(1 | mu, sigma); really supposed to have a constant 1 data for everything?

Given that mu and sigma don’t chane in this loop,

for (i in 1:k)                                                                                  
{                                                                                               
    lp_e[1, i] = 0;                                                                             
    lp_l[1, i] = 0;                                                                             
    prior_s[1, i] = normal_lpdf(1 | mu, sigma);                                                 
}  

This could just be:

transformed data {
  vector[k] k_zeros = rep_vector(0, k);
...
  lp_e[1] = k_zeros;
  lp_l[1] = k_zeros;
  prior_s[1] = rep_vector(normal_lpdf(1 | mu, sigma), k);

This is helpful as it avoids evaluating normal_lpdf k times. the same thing can happen in the other loop:

for (t in 1:T) {
  prior_s[t + 1] = rep_vector(normal_lpdf(t | mu, sigma), k);
  lp_e[t + 1] = lp_e[t];
  lp_l[t + 1] = lp_l[t];
}

This will be more efficient if rather than a matrix, you use an array of vectors, as the memory locality will be better (matrices are organized in memory by column).

Then the head operations can be simplified from head(lp_e[:, i], T) to lp_e[1:T, i], which will save a whole bunch of copying.

Thank you for the hints, even though the problem as stated is insoluble (it seems). I’ll incorporate those into my code and see how much it changes things - or try a different approach.