Obvious way to speed up huge logisitic regression?

The following (incredibly simple) logistic regression model is something I’d like to expand to a more complex error-in-variables model with a missclassified outcome. However, this is only worth doing if I can get the simpler model running fast enough (working with 150k - 500k observations). Is there any obvious way to speed up this QR parameterized logistic model below?

data {
    int<lower=0> N;                     // number of observations
    int<lower=0> P;                     // number of columns of design matrix
    int<lower=0,upper=1> Y[N];          // binary outcome
    matrix[N,P] Q;                      // design matrix Q factor
    matrix[P,P] Rinv;                   // design matrix R factor inverse
}

parameters {
    vector[P] beta_twiddle;             // coefficients alt parameterization
}

transformed parameters {
    vector[P] beta;                     // coefficents standard parameterization
    vector[N] mu;                       // linear predictor
    beta = Rinv * beta_twiddle;
    mu = Q * beta_twiddle;
}
model {
    beta ~ normal(0,10);

    Y ~ bernoulli_logit(mu);
}

One immediate thought is to calculate the transformed parameters after the fact, which would require a prior on beta_twiddle rather than beta.

Would the trick in section 28.8 (“Exploiting Sufficient Statistics”) in the manual help here? Where instead of

Y ~ bernoulli_logit(mu);

you would do

sum(Y) ~ binomial(N,logit(mu));

?

Interesting suggestion, I will have to think about this! The first wrinkle is that for the missclassification model I’ll have to instead consider a distribution over the sufficient statistic, which would slow things down a bit, but still could be much faster.

Oh, nope my suggestion won’t work; that trick applies when there is a single probability to express as the parameter of the binomial. Is there maybe a smaller set of unique values for mu on which you can loop over using this trick?

Unfortunately the design matrix is quite dense so I can’t see how to only consider a finite number of values for mu…

In the short-term, I’d appreciate a suggestion for the appropriate prior to put on beta_twiddle so I can save the multliplication. Or perhaps there’s a matvec specialized for triangular matrices?

You should try to exploit sparsity of your design matrix if possible.

There is a multiply_lower_tri function in stan-math… so you could only use it via C++ and it is going to run efficiently with the next 2.18 version if I recall correctly (maybe I am wrong and it runs already ok in 2.17, but some of these functions were improved).

If Rinv or Q are sparse, you could use the CSR multiply function.

You might also want to look into how @bgoodri coded priors for these kinds of models in RStanArm. That does the QR decomposition behind the scenes.

Good suggestion, I will look into this.

Will definitely look into this, cheers.