Weighted logistic regression

Dionysis_Manousakas · May 22, 2020, 2:38pm

I’d like to implement a logistic regression model (with normal prior) accepting inputs and corresponding non-negative weights w_n (i.e. multiplicities of point’s loglikelihood). Is the following implemenation correct?

weighted_logistic_code = """
data {
  int<lower=0> N; // number of observations
  int<lower=0> d; // dimensionality of x
  matrix[N,d] x; // inputs
  int<lower=0,upper=1> y[N]; // outputs in {0, 1}
  vector[N] w; // weights
}
parameters {
  real theta0; // intercept
  vector[d] theta; // auxiliary parameter
}
model {
  theta0 ~ normal(0, 1);
  theta ~ normal(0, 1);
  for(n in 1:N){
    target += w[n]*bernoulli_logit_lpmf(y[n]| theta0 + x[n]*theta);
  }
}
"""

Meow · May 22, 2020, 5:29pm

Per my understanding, the point estimation will be correct, but the standard error will be wrong (will be much more smaller than the true values).

Dionysis_Manousakas · May 22, 2020, 5:31pm

I’m trying to approximate a larger dataset, so inflating likelihoods is part of my solution

maxbiostat · May 23, 2020, 6:35am

@lauren knows a lot about complex surveys and might be able to chime in.

Guido_Biele · May 23, 2020, 11:04am

I think your solution is correct if indeed you are just doing this to avoid re-calculating the log likelihood for non-unique rows in your data set.

You can check this by fitting the two versions of your model (with and without weighting) for a smaller data set and compare the estimated parameters.

Guido_Biele · May 23, 2020, 7:18pm

If you just want to reduce the number of calls to the likelihood, sufficient statistics is a different and probably also the best way to go.

If you have following data

data {
  int<lower=0> N_unique;   // number of unique rows in x
  int<lower=0> d;
  matrix[N_unique, d] x;
  int<lower=0> U[N];       // number of cases in each row of x
  int<lower=0> Y[N];       // number of cases in each row of x with value 1
}

You should be able to use the binomial distribution for your likelihood:

model {
  theta0 ~ normal(0, 1);
  theta ~ normal(0, 1);
  target += binomial_logit_lpmf(Y | U, theta0 + x*theta)
}

No need for a loop here, because binomial_logit_lpmf is vectorized. Here is the Stan documentation for the binomial_logit_lpmf: https://mc-stan.org/docs/2_22/functions-reference/binomial-distribution-logit-parameterization.html.

Also check the Stan documentation for something like “exploiting sufficient statistics”.

Topic		Replies	Views
Error when fitting a Bernoulli logit model with weights Modeling rstan , techniques , fitting-issues , specification	1	525	November 5, 2020
Bayesian parallels of weighted regression Modeling	12	3901	July 29, 2021
Weighted Beta-Binomial Bayesian model Modeling rstan , specification	8	1017	September 11, 2023
Problem with modelling Logistic Regression in Rstan Modeling specification	4	505	March 9, 2021
Error in Stan code when modelling a weighted logistic regression model Modeling rstan , fitting-issues , specification	3	494	April 6, 2021

Weighted logistic regression

Related topics