Different results from noninformative logistic regression and MLE

Hi,
I’ve noticed some strange behavior from logistic regression in STAN. When the priors for \beta are improper uniform priors, I’d expect the fitting to return the maximum likelihood estimates as means. This is however not the case if n is comparable with p, the coefficients seem to be a constant multiple of the ML estimates. For example, in the included program, with n=80, p=15, the MLE \beta coefficients are approximately 0.6 times smaller than those resulting from STAN sampling, with a very close to linear relation. This behavior persists for various choices of n and p, with the results converging if n >> p.

What is happening? Is there a mistake in my thinking that the results should be equal? Is the STAN model wrong? Or where’s the problem?

Thanks in advance for any replies. I’ve included the STAN model and the Python code that is used for the simulation below.

data {
	int<lower=1> n;
	int<lower=1> p;
	matrix[n, p] X;
	int<lower=0,upper=1> y[n];
}
parameters {
	vector[p] beta;
	real beta_0;
}
model {
	y ~ bernoulli_logit(X * beta + beta_0);
}
import numpy as np

from scipy.stats import multivariate_normal as mvn
from scipy.stats import random_correlation
from scipy.special import expit

import pystan

import statsmodels.api as sm

lr_model = pystan.StanModel(file = "log_reg.stan")

# Setup of the problem
n = 80
p = 15

np.random.seed(140)

# Generating a random correlation matrix
top_bottom_ratio = p*p # larger means more correlation
unnormalized_range = np.exp(np.linspace(0.0, np.log(top_bottom_ratio), p))
eigenvals = unnormalized_range / sum(unnormalized_range) * p
X_cor = random_correlation.rvs(eigenvals)

mvn_gen = mvn(cov = X_cor, seed = 42)
X = mvn_gen.rvs(size = n)

beta = np.random.normal(size = p)
beta_0 = np.random.normal(size = 1)

y = np.random.binomial(1, expit(np.matmul(X, beta) + beta_0))

# STAN sampling
data_in = {'X':X, 'y':y, 'n':n, 'p':p}
fit = lr_model.sampling(data = data_in, iter = 2000, chains = 4)

# The same model evaluated using MLE
glm = sm.GLM(y, sm.add_constant(X, prepend = False), family = sm.families.Binomial())
res = glm.fit()

# The mean betas for the STAN samples
means_stan = fit.to_dataframe().iloc[:, 3:(3+p+1)].mean()
means_glm = res.params

# Linear regression of means_glm on means_stan ... coef should be 1.0 but isn't
coef = np.sum(means_stan * means_glm) / np.sum(means_stan**2)
means_stan_corr = coef * means_stan

# Verification that the stan solution doesn't correspond to maximal likelihood
p_stan = expit(np.matmul(X, means_stan[:-1]) + means_stan[-1])
p_glm = expit(np.matmul(X, means_glm[:-1]) + means_glm[-1])
p_stan_corr = expit(np.matmul(X, means_stan_corr[:-1]) + means_stan_corr[-1])

np.sum(y*np.log(p_stan) + (1-y)*np.log(1-p_stan))
np.sum(y*np.log(p_glm) + (1-y)*np.log(1-p_glm))
np.sum(y*np.log(p_stan_corr) + (1-y)*np.log(1-p_stan_corr))
1 Like

I’d expect this for a linear model with improper priors, but I wouldn’t expect this for anything else unless I had math that said it should happen. I guess this is saying the posterior mode equals the mean, which isn’t always gonna happen.

As n >> p here, the posterior will be more and more like a normal and so it make sense the mode and mean are the same.

2 Likes

Hm, I guess you’re right. I have mistaken the asymptotical results for something that should always hold. And indeed, I’ve evaluated the log-likelihood as a function of the scale of \beta and it drops quite slowly at the larger values, which in combination with the rapid increase of the relevant volume element for \beta makes the results plausible. Thanks for your reply.

1 Like