Any way to speed up a time-varying logistic regression?

Hi all,
I am completely new to Stan, and I am trying to fit a logistic regression model with time-varying parameters (and fixed regressors).
The idea is to model the dependent variable as a logistic regression and the model parameters as a random walk.
I have a matrix (N x T, where N is the number of datapoints I have, and T is the number of timesteps) for the response variable containing 0’s and 1’s.
What I noticed is that when the number of N increases (for example N \geq 300) the time to calibrate the model is very large (more than 20hours).

Is there anything I could do to make the model running faster? Is there anything incorrect in the model specification? Any help would be greatly appreciated!


Here’s the code of the model:

data {
  int<lower=0> N;   		   // number of datapoints
  int<lower=0> T;              // number of timesteps
  int<lower=0,upper=1> y[N,T]; // outcome matrix with 0's and 1's
  // Variables
  vector[N] x1;                // regressor 1
  vector[N] x2;                // regressor 2
  vector[N] x3;                // regressor 3
  vector[N] x4;                // regressor 4
  vector[N] x5;                // regressor 5
  vector[N] x6;                // regressor 6
  vector[N] x7;                // regressor 7
}
parameters {
  vector[T]   alpha;         
  vector[T]   b1;
  vector[T]   b2;
  vector[T]   b3;
  vector[T]   b4;
  vector[T]   b5;
  vector[T]   b6;
  vector[T]   b7;
  real<lower=0> Walpha;          
  real<lower=0> W1;
  real<lower=0> W2;
  real<lower=0> W3;
  real<lower=0> W4;
  real<lower=0> W5;
  real<lower=0> W6;
  real<lower=0> W7;
}
model {
alpha[1] ~ normal(0.0,25);
b1[1]  ~ normal(0.0, 25);
b2[1]  ~ normal(0.0, 25);
b3[1]  ~ normal(0.0, 25);
b4[1]  ~ normal(0.0, 25);
b5[1]  ~ normal(0.0, 25);
b6[1]  ~ normal(0.0, 25);
b7[1]  ~ normal(0.0, 25);

Walpha ~ inv_gamma(0.01,0.01);
W1     ~ inv_gamma(0.01,0.01);
W2     ~ inv_gamma(0.01,0.01);
W3     ~ inv_gamma(0.01,0.01);
W4     ~ inv_gamma(0.01,0.01);
W5     ~ inv_gamma(0.01,0.01);
W6     ~ inv_gamma(0.01,0.01);
W7     ~ inv_gamma(0.01,0.01);

alpha[2:T]  ~ normal(alpha[1:(T-1)], Walpha);
b1[2:T]     ~ normal(b1[1:(T-1)], W1);
b2[2:T]     ~ normal(b2[1:(T-1)], W2);
b3[2:T]     ~ normal(b3[1:(T-1)], W3);
b4[2:T]     ~ normal(b4[1:(T-1)], W4);
b5[2:T]     ~ normal(b5[1:(T-1)], W5);
b6[2:T]     ~ normal(b6[1:(T-1)], W6);
b7[2:T]     ~ normal(b7[1:(T-1)], W7);
for (t in 1:T){
  for (n in 1:N){
  	y[n,t] ~ bernoulli_logit(alpha[t] + b1[t]*x1[n] + b2[t]*x2[n]+ b3[t]*x3[n]+ b4[t]*x4[n]+ b5[t]*x5[n]+ b6[t]*x6[n]+ b7[t]*x7[n]);
}
} 
}

A few details that I forgot to mention that may be helpful.

  1. Right now, I have T=12 (although I’d like to investigate the effect of different timesteps on the model predictions.)
  2. I am running three chains with 10,000 samples (5,000 are of warmup) in parallel on a 9th Generation i7-9750H. (although the number of samples is pretty large, I noticed that for convergence purposes this is a reasonable number of samples.)

I’m not sure if bernoulli_log_glm is available in rstan (though it is available in cmdstan and through cmdstanr) but I think the following could speed things up a bit

data {
  int<lower=0> N;   		   // number of datapoints
  int<lower=0> T;              // number of timesteps
  int<lower=0,upper=1> y[N,T]; // outcome matrix with 0's and 1's
  // Variables
  matrix[N, 7] X; // regressors
}
parameters {
  matrix[7, T] beta;
  vector[T]   alpha;         
  real<lower=0> Walpha;          
  vector<lower=0>[7] W;  
}
model {
  alpha[1] ~ normal(0.0,25);
  beta[1, ] ~ normal(0, 25);
  Walpha ~ inv_gamma(0.01,0.01);
  W ~ inv_gamma(0.01,0.01);
  for (i in 2:T) {
    alpha[i] ~ normal(alpha[i - 1], Walpha);    
    beta[, i] ~ normal(beta[, i - 1], W);
  }
  for (t in 1:T){
    y[, t] ~ bernoulli_logit_glm(X, alpha[t], beta[, t]);
  } 
}

bernoulli_logit_glm can also utilize the OpenCL backend so if you have a GPU available you can follow the guide here to get the setup and then us ethe opencl arguments in cmdstanr

2 Likes

Thank you for the response! So, bernoulli_log_glm is actually available in rstan. However, the model performs in a similar manner as before, and I wasn’t able to see an improvement in the timing.

Hi,

For stochastic volatility models, the Stan recommendations here says that an standardization of the parameters, accelerates the convergence needing less iterations. You could try that transformation too and run maybe 5 thousands iterations!

Hope it helps

1 Like

What parameters are you running this with? if you are hitting the max tree depth or have high adapt delta the algorithm could be taking up more time. Also those priors seem very large which could be another factor

plot(density(rnorm(100000,rnorm(1, 0, 25), 1/rgamma(1, 0.01, 0.01))))

3 Likes

The normal density was to have basically a non-informative prior on the parameters beta, while the inverse gamma is the prior on the W’s. As for the max tree depth and the high adapt delta, I set those values after having the first runs showing the warning message that the default value would be too low for this model.

I’d recommend following the prior recommendations here, stan tends to do better with semi informative priors. something like a student_t with low df may be more appropriate.

As for the max tree depth and the high adapt delta, I set those values after having the first runs showing the warning message that the default value would be too low for this model.

What values are they at? A high max tree depth and adapt delta is usually going to be the culprit for very slow models. semi informative priors may help

What I noticed is that when the number of N increases (for example N ≥ 300) the time to calibrate the model is very large (more than 20hours).

Another thing to think about, with T=12 your going to have like 108 parameters. With N = 300 that’s going to be like less than 3 observations a parameter? Setting more informative priors could help with that

3 Likes

Just to add to @stevebronder’s comment. This is not even a non-informative prior if you apply the inverse logit (plogis in R) to the prior, you can think of this as your prior information about the intercept on the probability that your outcome is 1.

plot(density(plogis(rnorm(100000,rnorm(1, 0, 25), 1/rgamma(1, 0.01, 0.01)))))

The priors are basically saying that the model is really sure that you either have mostly 1 or mostly 0 for a given time. With a binomial model, my rule of thumb for my work in social sciences is that a regression coefficient of 10 (when predictors are standardized) is basically the same as infinity. I adjust my priors so that there is little probability mass above 5 and below -5.

6 Likes

Sorry, I thought I included these parameters in my previous post. I have adapt_delta=0.99, and max_treedepth=20. I understand that having higher values for these two parameters slows down the calibration. However, for any smaller value of these two parameters, I end up having divergent transitions after the warmup and others that exceed the maximum treedepth.

Those parameters are going to cause the algorithm to eat up a lot more time than the model itself will. A max treedepth of 20 allows NUTs to do 2^20 tree depth searches. usually having to turn up those so high is because of model misspecification or priors that don’t match up well to the posterior

2 Likes