Tips for speeding up model

Isobel · September 24, 2018, 11:38am

Hello, I am currently adapting and extending an algorithm, NetRate (Gomez-Rodriguez et al 2011, described here: http://snap.stanford.edu/class/cs224w-readings/rodriguez11diffusion.pdf) in RStan to estimate likely transmission pathways for infectious diseases using data about locations and times of cases showing symptoms. It works well on smaller datasets (N=100) but I am looking to apply the algorithm to larger datasets (N=10,000) and am keen to improve the speed of the model (perhaps through running in parallel / vectorization). Does anyone have any advice for how best to improve the speed?

The code is below, and I can also provide dummy data for reproducibility in RStan if useful.

//data comprising of 
//n total cases (all infectors)
//q total locally acquired cases (all potential infectees)
//NI is vector of locally acquired cases
//z dimensions of matrix n*q
//dt times between symptom onset of pairs of cases
//dis distance between pairs of cases
data {
  int n;
  int q;
  int NI[q];
  int z;
  vector[z] dt;
  vector[z] dis;
}
//parameters - A  determines connectedness of pairs in network
//epsilon edge - chance of being infected by outside source
//distance_scale - param shaping distance likelihood functon
parameters {
  vector < lower = 0, upper = 0.01 > [z] A;
  vector < lower = 0 > [q] epsilon_edge;
  real < lower = 0, upper = 100 > distance_scale;
}
transformed parameters {}
//model  adapted from NetRate (Gomexz-Rodriguez,2014)
model {
  real H = 0;
  real S = 0;
  int counter = 1;
  //priors
  for (i in 1: z) {
    A[i] ~normal(0.005, 0.001) T[0, 0.01];
  }
  for (i in 1: q) {
    epsilon_edge[i] ~exponential(10);
  }
  distance_scale~exponential(10) T[0, 100];
  counter = 1;
  for (i in 1: q) {
    H = 0;
    //sum of log hazards of infection across all candidate infectees
    for (j in 1: (NI[i])) {
      H = H + A[counter] * fmax(dt[counter], 0) * distance_scale * exp(-dis[counter] * distance_scale); //Rayleigh Hazard is b+at
      counter = counter + 1;
    }
    target += log(H + epsilon_edge[i]);
  }
  counter = 1;
  //sum of log survivals across infectees - chances of reaching time t without being infected by another case
  for (i in 1: q) {
    for (k in 1: (NI[i])) {
      target += -A[counter] * ((fmax(dt[counter], 0) ^ 2) * 0.5); //Rayleigh Survival on log is -a*0.5*t*t 
      counter = counter + 1;
    }
    target += -epsilon_edge[i];
  }

}

Thanks!

Topic		Replies	Views
How to improve model speed as the number of datapoints increase Modeling fitting-issues , performance	7	415	September 17, 2021
Speeding up a stochastic epidemic model with ODEs Modeling fitting-issues , performance	11	2873	June 20, 2017
Code review of slow hierarchical (ode) model Modeling fitting-issues , performance , epidemiology	3	777	November 19, 2020
Improve specification and sampling of SIS compartmental model Modeling rstan , optimization , specification	4	231	April 7, 2024
Survival model in Rstan Modeling	6	1296	December 5, 2018

Tips for speeding up model

Related topics