The following (incredibly simple) logistic regression model is something I’d like to expand to a more complex error-in-variables model with a missclassified outcome. However, this is only worth doing if I can get the simpler model running fast enough (working with 150k - 500k observations). Is there any obvious way to speed up this QR parameterized logistic model below?
data {
int<lower=0> N; // number of observations
int<lower=0> P; // number of columns of design matrix
int<lower=0,upper=1> Y[N]; // binary outcome
matrix[N,P] Q; // design matrix Q factor
matrix[P,P] Rinv; // design matrix R factor inverse
}
parameters {
vector[P] beta_twiddle; // coefficients alt parameterization
}
transformed parameters {
vector[P] beta; // coefficents standard parameterization
vector[N] mu; // linear predictor
beta = Rinv * beta_twiddle;
mu = Q * beta_twiddle;
}
model {
beta ~ normal(0,10);
Y ~ bernoulli_logit(mu);
}
One immediate thought is to calculate the transformed parameters after the fact, which would require a prior on beta_twiddle rather than beta.
Interesting suggestion, I will have to think about this! The first wrinkle is that for the missclassification model I’ll have to instead consider a distribution over the sufficient statistic, which would slow things down a bit, but still could be much faster.
Oh, nope my suggestion won’t work; that trick applies when there is a single probability to express as the parameter of the binomial. Is there maybe a smaller set of unique values for mu on which you can loop over using this trick?
In the short-term, I’d appreciate a suggestion for the appropriate prior to put on beta_twiddle so I can save the multliplication. Or perhaps there’s a matvec specialized for triangular matrices?
You should try to exploit sparsity of your design matrix if possible.
There is a multiply_lower_tri function in stan-math… so you could only use it via C++ and it is going to run efficiently with the next 2.18 version if I recall correctly (maybe I am wrong and it runs already ok in 2.17, but some of these functions were improved).