Model Definition for Bayesian Logistic Regression

Hi everyone. I want to apply the ADVI in some Bayesian Logistic Regression models. I am facing some problems in my Second Model (see below) which includes some hyper-priors.

Fisrt Model:

{y_i} \sim Bernoulli(\frac{e^{x_i \boldsymbol{\beta}}}{1+e^{x_i \boldsymbol{\beta}}}), i \in \{1,\dots,n\}, n \in \mathbb{N}, \boldsymbol{\beta} \sim \mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}), \boldsymbol{\beta}\in \mathbb{R}^p,
where \boldsymbol{\mu} and \boldsymbol{\Sigma} are given. So, the model in stan file is

data {
int<lower=0> n; // number of observations
int<lower=0> p; // number of parameters
array[n] int<lower=0, upper=1> y; // binary outcome variable
matrix[n, p] X; // matrix of predictor variables
vector[p] mu; // prior mu vector
matrix[p, p] Sigma; // prior Sigma arrray
}

parameters {
vector[p] beta; // coefficients for each predictor variable
}

model {
beta ~ multi_normal(mu, Sigma); // beta prior

y ~ bernoulli_logit(X * beta);  // likelihood
}



Second Model: {y_i} \sim Bernoulli(\frac{e^{x_i \boldsymbol{\beta}}}{1+e^{x_i \boldsymbol{\beta}}}), i \in \{1,\dots,n\}, n \in \mathbb{N},
\boldsymbol{\beta} \sim \mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}), \boldsymbol{\beta}\in \mathbb{R}^p, where \boldsymbol{\mu} is given and \boldsymbol{\Sigma} =A^{-1} where A=diag(\alpha_j), \alpha_j \sim Gamma(a_0,b_0), j \in \{1,\dots,p\}. So, the model in stan file is

data {
int<lower=0> n; // number of observations
int<lower=0> p; // number of parameters
array[n] int<lower=0, upper=1> y; // binary outcome variable
matrix[n, p] X; // matrix of predictor variables
vector[p] mu; // prior mu vector
real<lower=0> a0; // shape of(hyper)prior
real<lower=0> b0; // rate of(hyper)prior
}

parameters {
vector[p] beta; // coefficients for each predictor variable
vector[p] diag_array; // diagonal array parameter
}

transformed parameters {
matrix[p, p] Sigma;
Sigma = inverse(diag_matrix(diag_array)); // convert diag_array to diagonal matrix
}

model {
for (j in 1:p) {
diag_array[j] ~ gamma(a0, b0);
}

beta ~ multi_normal(mu, Sigma); // beta prior

y ~ bernoulli_logit(X * beta);  // likelihood
}


Unfortunately, I receive this error Chain 1 stan::variational::advi::calc_ELBO: The number of dropped evaluations has reached its maximum amount (100). Your model may be either severely ill-conditioned or misspecified. Probably my model is not properly defined.

Do you have any idea how to define my Second Model? I am new in stan and donâ€™t have much experience.

Thank you in advance!

A lot of your original text didnâ€™t parse, so Iâ€™m not entirely sure what youâ€™re after.

For your first model, I would suggest declaring Sigma as a cov_matrix to allow Stan to check for positive definiteness, then I would suggest Cholesky factoring it in a transformed_data block and then using multi_normal_cholesky. The reason is that multi_normal is cubic, whereas the Cholesky form is only quadratic in complexity. Itâ€™s unusual that people would know this prior for a regressionâ€”itâ€™s much more usual to see independent priors or hierarchical priors.

For the second model, there is a more efficient way to compute. First, get rid of the transformed parameters and define as follow. Iâ€™m using inv_sigma instead of diag_array because itâ€™s an inverse scale parameterâ€”I would not suggest naming parameters solely after their shapes. inv_scale would also work, but thatâ€™s the name of the parameter type, not the name of a parameter, so I went with â€śinv_sigmaâ€ť instead. I would suggest just having the user input the scale directly (sigma rather than inverse of sigma). The inv(x) function just computes 1 / x element wise more efficiently than 1 / x does itself.

sigma ~ gamma(a0, b0);
beta ~ normal(mu, inv(inv_sigma));  // equivalent to beta ~ multi_normal(mu, inverse(diag_matrix(sigma)));


I would check that you really want to invert sigma here if youâ€™re having issues.

1 Like

@Bob_Carpenter Thank you so much for the advices! They are very helpful.

For the second model, I removed the transformed parameters section, and I also modified my model as you suggested. The new model is as follows

data {
int<lower=0> n; // number of observations
int<lower=0> p; // number of parameters
array[n] int<lower=0, upper=1> y; // binary outcome variable
matrix[n, p] X; // matrix of predictor variables
vector[p] mu; // prior mu vector
real<lower=0> a0; // shape of(hyper)prior
real<lower=0> b0; // rate of(hyper)prior
}

parameters {
vector[p] beta; // coefficients for each predictor variable
vector[p] inv_sigma; // vector for diagonal values of inverse Sigma array
}

model {
for (j in 1:p) {
inv_sigma[j] ~ gamma(a0, b0);
beta[j] ~ normal(mu[j], inv(inv_sigma[j]));  // beta prior
}

y ~ bernoulli_logit(X * beta);  // observations
}


I also defined my model with an alternative way using the inv_gamma function:

parameters {
vector[p] beta; // coefficients for each predictor variable
vector[p] sigma; // vector for diagonal values of Sigma array
}

model {
for (j in 1:p) {
sigma[j] ~ inv_gamma(a0, b0);
beta[j] ~ normal(mu[j], sigma[j]);
}

y ~ bernoulli_logit(X * beta);  // likelihood
}


However, based on some trials, the model is working when the shape and rate (prior gamma distribution) are >= 7, but this is not always the case. I still get the same EROR: Chain 1 stan::variational::advi::calc_ELBO: The number of dropped evaluations has reached its maximum amount (100). Your model may be either severely ill-conditioned or misspecified.
Warning: Fitting finished unexpectedly! Use the \$output() method for more information.

Error: Fitting failed. Unable to retrieve the draws..

I guess I need to figure out what is the problem with the smaller values.