Optimal Likelihood Calculation for a Hawkes Process?

Hi all,

I am trying to create a likelihood for a Hawkes process. I’ve implemented a very naive likelihood function that is very slow, and was wondering if anyone had any creative ideas how to speed it up.

Likelihood

A Hawkes process is a self-exciting point process (poisson process) whose rate is determined by past events/rates. In general the process looks like this (from wikipedia):

{\displaystyle {\begin{array}{ll}\lambda (t)&=\mu (t)+\int _{-\infty }^{t}\nu (t-s)dN_{s}\\&=\mu (t)+\sum _{T_{k}<t}\nu (t-T_{k})\end{array}}}

where ν : R + → R + is a kernel function which expresses the positive influence of past events T_i on the current value of the rate \lambda(t), \mu(t) is a possibly non-stationary function representing the expected, predictable, or deterministic part of the intensity, and T _i : T_ i < T_ i + 1 ∈ R is the time of occurrence of the i-th event of the process.

Essentially, there is a baseline intensity (first quantity) and an additive intensity based on past events (second quantity). In my case I am modeling a citation process where both baseline and the excitation components decay exponentially. My likelihood looks like this.
\lambda(t)=\alpha \times \exp({-(\delta+\eta \cdot I) \cdot t }) + \beta \times \sum_{c(t=0)}^{c(t=(t-1))} \times \exp{(-(\gamma+\zeta \cdot I)\cdot({t-t_{c(t)})})}

where t is the time of citation c(t) and I is an indicator of membership in a group. \alpha,\beta,\delta,\gamma, \eta and \zeta \sim \text{gamma(1,10)}.

and the standard Poisson process likelihood is:

p(t=i|t={i-1})= \lambda(t=i) \times \exp({-\int_{t={i-1}}^{t=i}{\lambda(t=i)} } )
=\lambda(t=i) \times \exp({-\sum_{t={i-1}}^{t=i}{\lambda(t=i)} })

where i is the current year.


Implementation

Because citations for a given paper can be different lengths (ragged) I have to implement the total number of citations across all papers as a 1d vector c.

y is the years when these citations occur (starting at time 0).

s describes when to partition these ragged vectors into groups.

In the likelihood calculation I iterate across papers in this ragged vector c and then across years in that paper’s citation sequence. Note that I can do multiplication instead of summation within each year because each event within a year is equivalent. Lastly I am using the rate for each each citation event instead of the event itself to remove stochasticity.
Stan code:

data {
  int<lower=0> N;   // # observations
  int<lower=0> K;   // # of groups
  vector[N] c;      // # citations
  vector [K] comm;      // # whether belongs to a citation community
  int s[K];         // # group sizes
  vector[N] y;         // # years
}
parameters { 
        real<lower=0> alpha;
        real<lower=0> beta;
        real<lower=0> delta;
        real<lower=0> gamma;
        real<lower=0> mu;
        real<lower=0> sigma;
        } 
transformed parameters {
}
model {
    int pos = 1;
    
    alpha ~ gamma(1,10);
    beta ~ gamma(1,10);
    delta ~ gamma(1,10);
    gamma ~ gamma(1,10);
    mu ~ gamma(1,10);
    sigma ~ gamma(1,10);

    
    for (k in 1:K) { // for each paper in the ragged vector c
        int G = s[k]; // G is the length of this paper's citation sequence
        real q = comm[k]; // q is the indicator
        vector[G] lambda_vec =  alpha * exp(-(mu*q+delta)*y[pos:pos+G-1]);
        //baseline intensity from times 0...t
        
        for (i in 2:G) {
        //additive intensity. This is a for loop across times t from 1...t
            lambda_vec[i]+= beta * sum(lambda_vec[1:i-1] .* exp(-(sigma*q+gamma) * (y[pos+i-1]-y[pos:pos+i-2])  ) );
         // i use lambda(t) instead of c(t) here to remove stochasticity.
        }
        target+= sum(c[pos:pos+G-1] .* log(lambda_vec)) -sum(lambda_vec); //likelihood
        pos+=G; //iterate to the beginning of the next paper
    }
}

Question

It is extremely computationally expensive to iterate across every data point and takes hours even for just 1000 datapoints. Any ideas how to speed this up and do better than O(n^2)?

2 Likes

Just wanted to bump this and see if there’s any thoughts. Is there a way in stan to break up the likelihood and only have proposed components recalculated each iteration?

EDIT: Forgot to add this but here’s some simulation code:

data_gen_comm2<- function(alpha,beta,delta,gamma,len,reps,mu,sigma,comm2){
  years=seq(0,len-1)
  baseline=exp(-(mu*comm2+delta)*years)*alpha
  cites=rep(0,len)+baseline
  for (y in 2:len){
    temp = beta * sum( cites[1:(y-1)] * exp(-(sigma*comm2+gamma) * (years[[y]]-years[1:(y-1)]) )) 
    cites[y] = cites[y]+temp
  }
  random_cites<-rpois(lambda=cites,n=reps*len)
  #return(rep(cites,reps))
  return(random_cites)
}
mu1=3.
sigma1=5.
alpha1=10
beta1=2
delta1=.4
gamma1=1.2
len1=20
reps1=2000

citations = data_gen_comm2(alpha=alpha1,beta=beta1,delta=delta1,gamma=gamma1,len=len1,reps=reps1/2,mu=mu1,sigma=sigma1,comm2=1)
citations = c(citations,data_gen_comm2(alpha=alpha1,beta=beta1,delta=delta1,gamma=gamma1,len=len1,reps=reps1/2,mu=mu1,sigma=sigma1,comm2=0))
              
input_data_comm2 <- list(N= reps1*len1, K=  reps1, y= rep((seq(len1)-1),reps1), s = as.array(rep(len1,reps1)),
                         c = citations, comm = c(rep(1,reps1/2),rep(0,reps1/2))  )

temp<-stan_model(file='stancommandcomm2.stan',model_name='test')
optimizing(object=temp,data = input_data_comm2,algorithm='LBFGS')
fit_samp<-sampling(temp,data=input_data_comm2,control=list(adapt_delta=0.80),chains=3,iter=1000,cores=3, diagnostic_file='temp.diagnostics.text')

Looks like a nice puzzle, I’ll have a look whenever I have the time :)

Replacing the appropriate loop with what follows should (be correct/equivalent up to floating point error and) remove the quadratic runtime complexity in number of years per paper:

        real diff = 0;
        for (i in 2:G) {
            diff = (diff+beta*lambda_vec[i-1]) * exp(
                -(sigma*q+gamma)*(y[pos+i-1]-y[pos+i-2])
            );
            lambda_vec[i] += diff;
        }

Depending on the remaining structure of your data a better runtime constant might be possible.

I do not know what you mean with:

Every time your parameters change you’ll have to recompute the whole thing, so there is not much to be saved. Or what do you mean?

In principle you could start warming up/sampling with partial data and then add more data, but there is no public and tested implementation of this that I know of.

1 Like