# Weighted logistic regression

I’d like to implement a logistic regression model (with normal prior) accepting inputs and corresponding non-negative weights w_n (i.e. multiplicities of point’s loglikelihood). Is the following implemenation correct?

``````weighted_logistic_code = """
data {
int<lower=0> N; // number of observations
int<lower=0> d; // dimensionality of x
matrix[N,d] x; // inputs
int<lower=0,upper=1> y[N]; // outputs in {0, 1}
vector[N] w; // weights
}
parameters {
real theta0; // intercept
vector[d] theta; // auxiliary parameter
}
model {
theta0 ~ normal(0, 1);
theta ~ normal(0, 1);
for(n in 1:N){
target += w[n]*bernoulli_logit_lpmf(y[n]| theta0 + x[n]*theta);
}
}
"""
``````

Per my understanding, the point estimation will be correct, but the standard error will be wrong (will be much more smaller than the true values).

I’m trying to approximate a larger dataset, so inflating likelihoods is part of my solution

@lauren knows a lot about complex surveys and might be able to chime in.

I think your solution is correct if indeed you are just doing this to avoid re-calculating the log likelihood for non-unique rows in your data set.

You can check this by fitting the two versions of your model (with and without weighting) for a smaller data set and compare the estimated parameters.

If you just want to reduce the number of calls to the likelihood, sufficient statistics is a different and probably also the best way to go.

If you have following data

``````data {
int<lower=0> N_unique;   // number of unique rows in x
int<lower=0> d;
matrix[N_unique, d] x;
int<lower=0> U[N];       // number of cases in each row of x
int<lower=0> Y[N];       // number of cases in each row of x with value 1
}
``````

You should be able to use the binomial distribution for your likelihood:

``````model {
theta0 ~ normal(0, 1);
theta ~ normal(0, 1);
target += binomial_logit_lpmf(Y | U, theta0 + x*theta)
}
``````

No need for a loop here, because binomial_logit_lpmf is vectorized. Here is the Stan documentation for the binomial_logit_lpmf: https://mc-stan.org/docs/2_22/functions-reference/binomial-distribution-logit-parameterization.html.

Also check the Stan documentation for something like “exploiting sufficient statistics”.

1 Like