Custom log-likelihood function speed up

Hello stan forum,

I’ve manage to make my own loglikelihood function within stan, it works (at least it can be sampled from), but it is super slow (Gradient evaluation takes 0.122 seconds) - managed to bring it down from 15 seconds so its an improvement !

The model which I have/ trying to implement has the following log-likelihood :
-0.5\cdot (2a+N)\cdot log(2b+y^T(I_{N}+D \Sigma D^T)^{-1}y)

  • Where a and b are set as very small values (10^{-5})
  • y is the observed data
  • I_{N} is the identity matrix of N points
  • \Sigma is a matrix defined as \Sigma=g(D^{T}D)^{-1}, where g is set at a high number 10^5
  • D is an [N\times 2\cdot K] matrix where every odd column is given as cos(f_ {j}\cdot N_{i})\cdot exp(-\alpha_j\cdot t_i) and every even column is given as sin(f_ {j}\cdot t_{i})\cdot exp(-\alpha_j\cdot t_i) with t being time, f being a frequency parameter and \alpha being a decay rate parameter. The goal is to estimate f and \alpha from the log-likelihood. As of right now j is only equal to 3 - but should in the future be able to handle a lot more.

Does anyone have an idea of how to make the code faster ?:)

The stan code for the estimation of f is shown below

functions {
  real freq_dist_log(vector y,vector time, vector f,vector alpha, int k,int N, 
  real delta, real noise_prior_alpha, real noise_prior_beta){

// set up local values  
  matrix[N,2*k] D_damp; // D-matrix with dampning
  matrix[2*k,2*k] D_damp_ch; // cholesky D-matrix with dampning
  matrix[2*k,N] Sigma; // Sigma cov-matrix
  matrix[N,N] I_mat; // identity matrix
  matrix[N,N] mat1; // matrix made for easier code reading  
  matrix[N,N] L;
  vector[N] A;
  real prob; // log-liklihood 
  
  for (j in 1:k){
    for (i in 1:N){
       D[i,j*2-1]=cos(f[j]*time[i])*exp(-alpha[j]*time[i]);
       D[i,j*2]=sin(f[j]*time[i])*exp(-alpha[j]*time[i]);
    }
  } 
  
// compute Sigma from cholesky decompostion of D
   D_ch=cholesky_decompose(crossprod(D));
   Sigma=delta*D_ch\(D');
  
// set up likelihood function from Sigma 
   
   I_mat=diag_matrix(rep_vector(1,N)); // identity matrix
   mat1=(I_mat+crossprod(Sigma)); // corresponds to (I_N+D*Sigma*D^T)
   L=cholesky_decompose(mat1); // Cholesky decomp of mat1
   A=((L))\y; // solve for A
   
   
  // log-likelihood 
   prob=-0.5*(2*noise_prior_alpha+N)*log(2*noise_prior_beta+A'*A); // return the loglik
 
  return prob;
   
  }
}

data {
  int<lower=0> N; // number of points
  vector[N] y; // observations
  vector[N] time; // time observations
  int <lower=0> k; // no of components
  // priors 
  real<lower=0> noise_prior_alpha; // 
  real<lower=0> noise_prior_beta; // 
 vector[k] alpha_prior; 
  real<lower=0> delta;
}

parameters {
    positive_ordered [k] f  ;
    vector <lower=0> [k] alpha;
}

model {
  // priors
  f~uniform(0,pi());
 
  y~freq_dist(time,f,alpha,k,N,delta,noise_prior_alpha,noise_prior_beta);
}

The R code to generate the data is given below, note that the stan model to call is just called stan_model.

# data and model params
N=500 # points
dt=0.1 # dwell time
t_max=(N-1)

# time points
j_idx=seq(0,t_max,length.out = N)
time=j_idx

# amplitudes & phase angles
beta1=c(1/sqrt(2),0.5,3)
beta2=c(1/sqrt(2),2,10)

A_1=sqrt(beta1[1]^2+beta2[1]^2) # real amp
A_2=sqrt(beta1[2]^2+beta2[2]^2) # real amp
A_3=sqrt(beta1[3]^2+beta2[3]^2) # real amp

A_vect=c(A_1,A_2,A_3)

phi_1=atan2(-beta2[1],beta1[1]) # phase angle
phi_2=atan2(-beta2[2],beta1[2]) # phase angle
phi_3=atan2(-beta2[3],beta1[3]) # phase angle

# frequcenies and decay rates
omega_vect=c(0.2,0.8,2)
alpha_vect=c(0.001,0.03,0.06)

# simulate data from parameters 
k=length(omega_vect) # number of components

y_l=list()

for (i in 1:k){
  y_l[i]=list(exp(-alpha_vect[i]*time)*A_vect[i]*cos(omega_vect[i]*time-phi_vect[i]))
}
y_data.frame=do.call("cbind",y_l)
y_sum=rowSums(y_data.frame)

# add noise to generated data
noise=0.01 
noise_mean=0 

y_noise=y_sum+rnorm(length(y_sum),noise_mean,sqrt(noise))

# priors of hyper params
exp_prior=rep(2,k) # log-decay prior from exp
noise_prior.alpha=10^-5 # shape noise from inv.gam
noise_prior.beta=10^-5 # scale noise from inv.gam

# initial values of all parameters
init_fun=function(){
  list(  
    f= c(0.199,0.800,0.999 ) ,
    alpha=c(rep(0.001,k)
)
}

# collect to a list of data input
dat=list(N = length(y_noise),
         time=c(t(time)),
         k=k,
         delta=c(t(10^5)),
         alpha_prior=c(t(exp_prior)),
         noise_prior_alpha=c(t(noise_prior.alpha)),
         noise_prior_beta=c(t(noise_prior.beta)),
         y = c(t((y_noise))))

# Run the stan model 
fit1=stan(file = 'stan_model.stan',init =init_fun, algorithm = "NUTS", chains = 1, data = dat,iter=5000)

1 Like

Would you mind running your code in a fresh R session? There are a number of errors when I do so that prevent me from understanding what you’re trying to achieve and advise.

1 Like

Thanks for looking into it mike-lawrence, really appriate the help:). yeah you are right something was missing in the R-code - most likely due to a non fresh session - I have update the code below now:)

Also the model which the likelihood is applied to can be described as a mixture of sinusoid decays with K components

y_t=\sum^{K}_{k=1}A_{k} \cdot cos(f\cdot t-\phi_{k})\cdot exp(-\alpha\cdot t)

Hopefully it would be able to handle 15-20 components in the end :)

# read in packages
library(loo)
library(StanHeaders)
library(rstan)
library(RcppEigen)
library(bayesplot)
library(gridExtra)
library(rstanarm)

# set own wd or paste in stan model

# data and model params
N=500 # points
dt=0.1 # dwell time
t_max=(N-1)

# time points
j_idx=seq(0,t_max,length.out = N)
time=j_idx

# amplitudes & phase angles
beta1=c(1/sqrt(2),0.5,3)
beta2=c(1/sqrt(2),2,10)

A_1=sqrt(beta1[1]^2+beta2[1]^2) # real amp
A_2=sqrt(beta1[2]^2+beta2[2]^2) # real amp
A_3=sqrt(beta1[3]^2+beta2[3]^2) # real amp

A_vect=c(A_1,A_2,A_3)

phi_1=atan2(-beta2[1],beta1[1]) # phase angle
phi_2=atan2(-beta2[2],beta1[2]) # phase angle
phi_3=atan2(-beta2[3],beta1[3]) # phase angle

phi_vect=c(phi_1,phi_2,phi_3)

# frequcenies and decay rates
omega_vect=c(0.2,0.8,2)
alpha_vect=c(0.001,0.03,0.06)

# simulate data from parameters 
k=length(omega_vect) # number of components

y_l=list()

for (i in 1:k){
  y_l[i]=list(exp(-alpha_vect[i]*time)*A_vect[i]*cos(omega_vect[i]*time-phi_vect[i]))
}
y_data.frame=do.call("cbind",y_l)
y_sum=rowSums(y_data.frame)

# add noise to generated data
noise=0.01 
noise_mean=0 

y_noise=y_sum+rnorm(length(y_sum),noise_mean,sqrt(noise))

# priors of hyper params
exp_prior=rep(2,k) # log-decay prior from exp
noise_prior.alpha=10^-5 # shape noise from inv.gam
noise_prior.beta=10^-5 # scale noise from inv.gam

# initial values of all parameters
init_fun=function(){
  list(  
    f= c(0.199,0.800,0.999 ) ,
    alpha=c(rep(0.001,k))
  )
}

# collect to a list of data input
dat=list(N = length(y_noise),
         time=c(t(time)),
         k=k,
         delta=c(t(10^5)),
         alpha_prior=c(t(exp_prior)),
         noise_prior_alpha=c(t(noise_prior.alpha)),
         noise_prior_beta=c(t(noise_prior.beta)),
         y = c(t((y_noise))))

# Run the stan model 
fit1=stan(file = 'stan_model.stan',init =init_fun, algorithm = "NUTS", chains = 1, data = dat,iter=5000)


I also have a sligth chance for the priors of \alpha being an exp-priror, hence in the model block in the stan code the following is added:

alpha~exponential(alpha_prior);
1 Like

andrieu_doucet_ieeesp_spectralanalysis (2).pdf (286.0 KB)

some additional information is that the posterior stems from the attached paper , specifically equation 12 without the reversible jump part:)

Ah, I’ve actually been working on this myself recently! At least, for the case of stationary signals (yours seem to start with a high-noise regime that then decays to a low-noise regime?). Check out the discussion over in the Stan Slack Random channel. There’s a problem with sinusoidal models generally where they have crazy multimodality in the likelihood. I think I’m close to understanding why this happens, but not really close to a solution I’m happy with.

3 Likes

glad I’m not the only one with problems with this subject - been at it for a few months now :). My model is basically a mixture of decaying sinusoids - going from high SNR to low SNR.

Yeah, I had the same multimodial likelihood problem as you described. I overcame it somewhat by using good initial “guess” via something I discovered almost by accident called espirte frequency estimation:

https://en.wikipedia.org/wiki/Estimation_of_signal_parameters_via_rotational_invariance_techniques.

The only problem is that it does not handle aliasing very well though :(

Also my custom function did work - got the correct parameters with nice R-hat values and good sample size - only it took almost 2 days for it to complete, with merely N=500 and K=3. Now I’m running with K=10 and N=500, and its not very fast - running with 16 cores on an i9 processer and it does not seem to do a lot

1 Like