Any way to speed up a time-varying logistic regression?

fnocera · March 31, 2020, 3:59pm

Hi all,
I am completely new to Stan, and I am trying to fit a logistic regression model with time-varying parameters (and fixed regressors).
The idea is to model the dependent variable as a logistic regression and the model parameters as a random walk.
I have a matrix (N x T, where N is the number of datapoints I have, and T is the number of timesteps) for the response variable containing 0’s and 1’s.
What I noticed is that when the number of N increases (for example N \geq 300) the time to calibrate the model is very large (more than 20hours).

Is there anything I could do to make the model running faster? Is there anything incorrect in the model specification? Any help would be greatly appreciated!

Here’s the code of the model:

data {
  int<lower=0> N;   		   // number of datapoints
  int<lower=0> T;              // number of timesteps
  int<lower=0,upper=1> y[N,T]; // outcome matrix with 0's and 1's
  // Variables
  vector[N] x1;                // regressor 1
  vector[N] x2;                // regressor 2
  vector[N] x3;                // regressor 3
  vector[N] x4;                // regressor 4
  vector[N] x5;                // regressor 5
  vector[N] x6;                // regressor 6
  vector[N] x7;                // regressor 7
}
parameters {
  vector[T]   alpha;         
  vector[T]   b1;
  vector[T]   b2;
  vector[T]   b3;
  vector[T]   b4;
  vector[T]   b5;
  vector[T]   b6;
  vector[T]   b7;
  real<lower=0> Walpha;          
  real<lower=0> W1;
  real<lower=0> W2;
  real<lower=0> W3;
  real<lower=0> W4;
  real<lower=0> W5;
  real<lower=0> W6;
  real<lower=0> W7;
}
model {
alpha[1] ~ normal(0.0,25);
b1[1]  ~ normal(0.0, 25);
b2[1]  ~ normal(0.0, 25);
b3[1]  ~ normal(0.0, 25);
b4[1]  ~ normal(0.0, 25);
b5[1]  ~ normal(0.0, 25);
b6[1]  ~ normal(0.0, 25);
b7[1]  ~ normal(0.0, 25);

Walpha ~ inv_gamma(0.01,0.01);
W1     ~ inv_gamma(0.01,0.01);
W2     ~ inv_gamma(0.01,0.01);
W3     ~ inv_gamma(0.01,0.01);
W4     ~ inv_gamma(0.01,0.01);
W5     ~ inv_gamma(0.01,0.01);
W6     ~ inv_gamma(0.01,0.01);
W7     ~ inv_gamma(0.01,0.01);

alpha[2:T]  ~ normal(alpha[1:(T-1)], Walpha);
b1[2:T]     ~ normal(b1[1:(T-1)], W1);
b2[2:T]     ~ normal(b2[1:(T-1)], W2);
b3[2:T]     ~ normal(b3[1:(T-1)], W3);
b4[2:T]     ~ normal(b4[1:(T-1)], W4);
b5[2:T]     ~ normal(b5[1:(T-1)], W5);
b6[2:T]     ~ normal(b6[1:(T-1)], W6);
b7[2:T]     ~ normal(b7[1:(T-1)], W7);
for (t in 1:T){
  for (n in 1:N){
  	y[n,t] ~ bernoulli_logit(alpha[t] + b1[t]*x1[n] + b2[t]*x2[n]+ b3[t]*x3[n]+ b4[t]*x4[n]+ b5[t]*x5[n]+ b6[t]*x6[n]+ b7[t]*x7[n]);
}
} 
}

fnocera · March 31, 2020, 7:32pm

A few details that I forgot to mention that may be helpful.

Right now, I have T=12 (although I’d like to investigate the effect of different timesteps on the model predictions.)
I am running three chains with 10,000 samples (5,000 are of warmup) in parallel on a 9th Generation i7-9750H. (although the number of samples is pretty large, I noticed that for convergence purposes this is a reasonable number of samples.)

stevebronder · March 31, 2020, 8:34pm

I’m not sure if bernoulli_log_glm is available in rstan (though it is available in cmdstan and through cmdstanr) but I think the following could speed things up a bit

data {
  int<lower=0> N;   		   // number of datapoints
  int<lower=0> T;              // number of timesteps
  int<lower=0,upper=1> y[N,T]; // outcome matrix with 0's and 1's
  // Variables
  matrix[N, 7] X; // regressors
}
parameters {
  matrix[7, T] beta;
  vector[T]   alpha;         
  real<lower=0> Walpha;          
  vector<lower=0>[7] W;  
}
model {
  alpha[1] ~ normal(0.0,25);
  beta[1, ] ~ normal(0, 25);
  Walpha ~ inv_gamma(0.01,0.01);
  W ~ inv_gamma(0.01,0.01);
  for (i in 2:T) {
    alpha[i] ~ normal(alpha[i - 1], Walpha);    
    beta[, i] ~ normal(beta[, i - 1], W);
  }
  for (t in 1:T){
    y[, t] ~ bernoulli_logit_glm(X, alpha[t], beta[, t]);
  } 
}

bernoulli_logit_glm can also utilize the OpenCL backend so if you have a GPU available you can follow the guide here to get the setup and then us ethe opencl arguments in cmdstanr

fnocera · March 31, 2020, 11:02pm

Thank you for the response! So, bernoulli_log_glm is actually available in rstan. However, the model performs in a similar manner as before, and I wasn’t able to see an improvement in the timing.

asael_am · March 31, 2020, 11:25pm

Hi,

For stochastic volatility models, the Stan recommendations here says that an standardization of the parameters, accelerates the convergence needing less iterations. You could try that transformation too and run maybe 5 thousands iterations!

Hope it helps

stevebronder · April 1, 2020, 12:40am

What parameters are you running this with? if you are hitting the max tree depth or have high adapt delta the algorithm could be taking up more time. Also those priors seem very large which could be another factor

plot(density(rnorm(100000,rnorm(1, 0, 25), 1/rgamma(1, 0.01, 0.01))))

fnocera · April 1, 2020, 12:55am

The normal density was to have basically a non-informative prior on the parameters beta, while the inverse gamma is the prior on the W’s. As for the max tree depth and the high adapt delta, I set those values after having the first runs showing the warning message that the default value would be too low for this model.

stevebronder · April 1, 2020, 3:23am

I’d recommend following the prior recommendations here, stan tends to do better with semi informative priors. something like a student_t with low df may be more appropriate.

As for the max tree depth and the high adapt delta, I set those values after having the first runs showing the warning message that the default value would be too low for this model.

What values are they at? A high max tree depth and adapt delta is usually going to be the culprit for very slow models. semi informative priors may help

What I noticed is that when the number of N increases (for example N ≥ 300) the time to calibrate the model is very large (more than 20hours).

Another thing to think about, with T=12 your going to have like 108 parameters. With N = 300 that’s going to be like less than 3 observations a parameter? Setting more informative priors could help with that

stijn · April 1, 2020, 4:03am

Just to add to @stevebronder’s comment. This is not even a non-informative prior if you apply the inverse logit (plogis in R) to the prior, you can think of this as your prior information about the intercept on the probability that your outcome is 1.

plot(density(plogis(rnorm(100000,rnorm(1, 0, 25), 1/rgamma(1, 0.01, 0.01)))))

The priors are basically saying that the model is really sure that you either have mostly 1 or mostly 0 for a given time. With a binomial model, my rule of thumb for my work in social sciences is that a regression coefficient of 10 (when predictors are standardized) is basically the same as infinity. I adjust my priors so that there is little probability mass above 5 and below -5.

fnocera · April 1, 2020, 2:48pm

Sorry, I thought I included these parameters in my previous post. I have adapt_delta=0.99, and max_treedepth=20. I understand that having higher values for these two parameters slows down the calibration. However, for any smaller value of these two parameters, I end up having divergent transitions after the warmup and others that exceed the maximum treedepth.

stevebronder · April 1, 2020, 4:08pm

Those parameters are going to cause the algorithm to eat up a lot more time than the model itself will. A max treedepth of 20 allows NUTs to do 2^20 tree depth searches. usually having to turn up those so high is because of model misspecification or priors that don’t match up well to the posterior

Topic		Replies	Views
Simple stan regression model takes 60 hours to complete. Why? Modeling	3	1123	December 31, 2020
Speed up a multiple linear regression (log scale) Modeling	3	1408	June 17, 2017
My first Stan model - hierarchical logistic regression Modeling cognitive-science	28	8413	June 6, 2017
Stan speed drastically decreases when adding one extra parameter Modeling performance	3	1238	November 27, 2017
Logistic regression Modeling	1	561	November 27, 2017

Any way to speed up a time-varying logistic regression?

Related topics