Increasing performance of this code for logistic regression

hwilde · February 28, 2020, 6:56pm

So I have defined a logistic regression model as below that takes in some real and synthetic data and applies a weighting to the synthetic data, then I calculate log pmfs of some unseen data so that I can later calculate log score from them and also some probabilities on the test data to calculate ROC AUC etc with. I am working with datasets with a few thousand points and execution is quite slow with around 5000 iters taking over 10 mins, can anyone see a reason for this to be the case?

data {
    
    int<lower=0> f;
    int<lower=0> a;
    matrix[a, f] X_real;
    int<lower=0, upper=1> y_real[a];
    int<lower=0> b;
    matrix[b, f] X_synth;
    int<lower=0, upper=1> y_synth[b];
    int<lower=0> c;
    matrix[c, f] X_test;
    int<lower=0, upper=1> y_test[c];
    real<lower=0> w;

}

parameters {
    
    vector[f] coefs;
    real alpha;

}

model {
    
    target += bernoulli_logit_glm_lpmf(y_real | X_real, alpha, coefs);
    target += w * bernoulli_logit_glm_lpmf(y_synth | X_synth, alpha, coefs);

}

generated quantities {

    real log_likes_test;
    vector[c] probabilities_test;
    log_likes_test = bernoulli_logit_glm_lpmf(y_test | X_test, alpha, coefs);
    probabilities_test = inv_logit(alpha + X_test * coefs);

}

Thanks!

emiruz · March 1, 2020, 3:33pm

Generally, what’s the Rhat on your parameters once the model does converge?

Presumably it’s convergence that’s taking up all the time and I’m curious to know what contributes to it. HMC is sensitive to the “geometry” of the likelihood function. If you leave out the weighting term w, how does it affect the speed of convergence?

If w ends up being the culprit maybe we can work out an alternative way to do the weighting?

Worse case scenario, map_rect can give you linear speed ups if youve got cores to spare.

avehtari · March 1, 2020, 5:46pm

You have not set any prior on coefs, which means the prior is improper and combined with bernoulli_logit the posterior can be improper and MCMC can’t produce valid posterior draws even in infinite time. Add proper priors and report the result here

Topic		Replies	Views
Understanding performance expectations of `_glm` distributions Modeling techniques	2	259	February 8, 2024
Improving Speed for hierarchical logit/probit models Modeling rstan , performance , hierarchical-model	2	723	October 26, 2020
Error when fitting a Bernoulli logit model with weights Modeling rstan , techniques , fitting-issues , specification	1	525	November 5, 2020
Trying to understand _glm_lp*f functions in Stan Modeling techniques	8	722	July 6, 2020
Weighted logistic regression Modeling specification	5	1508	May 23, 2020

Increasing performance of this code for logistic regression

Related topics