Weighted logistic regression

If you just want to reduce the number of calls to the likelihood, sufficient statistics is a different and probably also the best way to go.

If you have following data

data {
  int<lower=0> N_unique;   // number of unique rows in x
  int<lower=0> d;
  matrix[N_unique, d] x;
  int<lower=0> U[N];       // number of cases in each row of x
  int<lower=0> Y[N];       // number of cases in each row of x with value 1
}

You should be able to use the binomial distribution for your likelihood:

model {
  theta0 ~ normal(0, 1);
  theta ~ normal(0, 1);
  target += binomial_logit_lpmf(Y | U, theta0 + x*theta)
}

No need for a loop here, because binomial_logit_lpmf is vectorized. Here is the Stan documentation for the binomial_logit_lpmf: https://mc-stan.org/docs/2_22/functions-reference/binomial-distribution-logit-parameterization.html.

Also check the Stan documentation for something like “exploiting sufficient statistics”.

1 Like