Increasing performance of this code for logistic regression

So I have defined a logistic regression model as below that takes in some real and synthetic data and applies a weighting to the synthetic data, then I calculate log pmfs of some unseen data so that I can later calculate log score from them and also some probabilities on the test data to calculate ROC AUC etc with. I am working with datasets with a few thousand points and execution is quite slow with around 5000 iters taking over 10 mins, can anyone see a reason for this to be the case?

data {
    int<lower=0> f;
    int<lower=0> a;
    matrix[a, f] X_real;
    int<lower=0, upper=1> y_real[a];
    int<lower=0> b;
    matrix[b, f] X_synth;
    int<lower=0, upper=1> y_synth[b];
    int<lower=0> c;
    matrix[c, f] X_test;
    int<lower=0, upper=1> y_test[c];
    real<lower=0> w;


parameters {
    vector[f] coefs;
    real alpha;


model {
    target += bernoulli_logit_glm_lpmf(y_real | X_real, alpha, coefs);
    target += w * bernoulli_logit_glm_lpmf(y_synth | X_synth, alpha, coefs);


generated quantities {

    real log_likes_test;
    vector[c] probabilities_test;
    log_likes_test = bernoulli_logit_glm_lpmf(y_test | X_test, alpha, coefs);
    probabilities_test = inv_logit(alpha + X_test * coefs);



Generally, what’s the Rhat on your parameters once the model does converge?

Presumably it’s convergence that’s taking up all the time and I’m curious to know what contributes to it. HMC is sensitive to the “geometry” of the likelihood function. If you leave out the weighting term w, how does it affect the speed of convergence?

If w ends up being the culprit maybe we can work out an alternative way to do the weighting?

Worse case scenario, map_rect can give you linear speed ups if youve got cores to spare.

You have not set any prior on coefs, which means the prior is improper and combined with bernoulli_logit the posterior can be improper and MCMC can’t produce valid posterior draws even in infinite time. Add proper priors and report the result here