I’m experiencing significant performance slowdowns in STAN when solving logistic regression problems with separable or near-separable data.

Performance is important, so we have implemented a few optimizations. We use QR decomposition to reduce the dependence between parameters, and we also exploit sufficient statistics to reduce the time we spend computing the gradient. These optimizations have a noticeable impact on performance, but we’re still struggling in a few corner cases.

Here is our current model.

```
data {
int<lower=0> num_predictors; // Number of predictors
int<lower=0> num_original_predictors; // Number of predictors *before* qr decomposition
int<lower=1> num_rows; // Number of rows
matrix[num_rows, num_predictors] predictors;
array[num_rows] int<lower=0> num_successes;
array[num_rows] int<lower=1> count; // Number of repetitions
matrix[num_original_predictors, num_predictors] r_star_inverse;
}
parameters {
vector[num_predictors] theta_beta_tilde;
}
transformed parameters {
vector[num_original_predictors] beta = r_star_inverse * theta_beta_tilde;
}
model {
theta_beta_tilde ~ student_t(9, 0, 1);
target += binomial_logit_lupmf(num_successes | count, predictors * theta_beta_tilde);
}
```

However, our model can run over 10-100x slower (3 hours vs 5 min) when there is separability or near-separability. Separability means that one or more of the predictors will diverge to infinity. Only the inclusion of a prior will ensure that the problem is well-defined.

However, even when our model is well-defined, the shape of the posterior seems to give STAN some difficulties.

It seems that we have a funnel shape posterior, that is well-known to cause STAN problems.

I would love to hear approaches for how I can speed up model solving when there is separability.

Here are some ideas that I’ve had:

## Transforming the parameters

My guess as to why model-fitting is so slow is that curvature of the posterior along `theta_beta_tilde[1]`

is high variable. STAN is spending equal time sampling from a small area with high likelihood and a much larger area with lower likelihood. However, I’m not sure what a good transform would be. Further, because the curvature is coming from the data and not the model, we’d have to numerical compute a transform.

## Special-casing Separability

Maybe there’s another model that we can be using when we detect the separability? Is so, what would this be?

## Adjusting Priors

Are there priors that would provide better shaped posteriors? I’ve played with various degrees of freedom for the Student t prior (including normal), and while some run faster than others, I haven’t yet found one that was “fast”

## Adjusting Hamiltonian Monte Carlo fitting

Are there parameters that I can provide to STAN that would help fit this type of data better?

# How you can help

Eager to hear what you all think of these ideas, or to hear what other ideas might help.