So I have defined a logistic regression model as below that takes in some real and synthetic data and applies a weighting to the synthetic data, then I calculate log pmfs of some unseen data so that I can later calculate log score from them and also some probabilities on the test data to calculate ROC AUC etc with. I am working with datasets with a few thousand points and execution is quite slow with around 5000 iters taking over 10 mins, can anyone see a reason for this to be the case?
data {
int<lower=0> f;
int<lower=0> a;
matrix[a, f] X_real;
int<lower=0, upper=1> y_real[a];
int<lower=0> b;
matrix[b, f] X_synth;
int<lower=0, upper=1> y_synth[b];
int<lower=0> c;
matrix[c, f] X_test;
int<lower=0, upper=1> y_test[c];
real<lower=0> w;
}
parameters {
vector[f] coefs;
real alpha;
}
model {
target += bernoulli_logit_glm_lpmf(y_real | X_real, alpha, coefs);
target += w * bernoulli_logit_glm_lpmf(y_synth | X_synth, alpha, coefs);
}
generated quantities {
real log_likes_test;
vector[c] probabilities_test;
log_likes_test = bernoulli_logit_glm_lpmf(y_test | X_test, alpha, coefs);
probabilities_test = inv_logit(alpha + X_test * coefs);
}
Thanks!