Model Definition for Bayesian Logistic Regression

Hi everyone. I want to apply the ADVI in some Bayesian Logistic Regression models. I am facing some problems in my Second Model (see below) which includes some hyper-priors.


Fisrt Model:

{y_i} \sim Bernoulli(\frac{e^{x_i \boldsymbol{\beta}}}{1+e^{x_i \boldsymbol{\beta}}}), i \in \{1,\dots,n\}, n \in \mathbb{N}, \boldsymbol{\beta} \sim \mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}), \boldsymbol{\beta}\in \mathbb{R}^p,
where \boldsymbol{\mu} and \boldsymbol{\Sigma} are given. So, the model in stan file is

data {
  int<lower=0> n; // number of observations
  int<lower=0> p; // number of parameters
  array[n] int<lower=0, upper=1> y; // binary outcome variable
  matrix[n, p] X; // matrix of predictor variables
  vector[p] mu; // prior mu vector
  matrix[p, p] Sigma; // prior Sigma arrray
}

parameters {
  vector[p] beta; // coefficients for each predictor variable
}

model {
  beta ~ multi_normal(mu, Sigma); // beta prior
  
  y ~ bernoulli_logit(X * beta);  // likelihood
}


Second Model: {y_i} \sim Bernoulli(\frac{e^{x_i \boldsymbol{\beta}}}{1+e^{x_i \boldsymbol{\beta}}}), i \in \{1,\dots,n\}, n \in \mathbb{N},
\boldsymbol{\beta} \sim \mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}), \boldsymbol{\beta}\in \mathbb{R}^p, where \boldsymbol{\mu} is given and \boldsymbol{\Sigma} =A^{-1} where A=diag(\alpha_j), \alpha_j \sim Gamma(a_0,b_0), j \in \{1,\dots,p\}. So, the model in stan file is

data {
  int<lower=0> n; // number of observations
  int<lower=0> p; // number of parameters
  array[n] int<lower=0, upper=1> y; // binary outcome variable
  matrix[n, p] X; // matrix of predictor variables
  vector[p] mu; // prior mu vector
  real<lower=0> a0; // shape of(hyper)prior
  real<lower=0> b0; // rate of(hyper)prior
}

parameters {
  vector[p] beta; // coefficients for each predictor variable
  vector[p] diag_array; // diagonal array parameter
}

transformed parameters {
  matrix[p, p] Sigma;
  Sigma = inverse(diag_matrix(diag_array)); // convert diag_array to diagonal matrix
}


model {
  for (j in 1:p) {
    diag_array[j] ~ gamma(a0, b0);
  }
  
  beta ~ multi_normal(mu, Sigma); // beta prior
  
  y ~ bernoulli_logit(X * beta);  // likelihood
}

Unfortunately, I receive this error Chain 1 stan::variational::advi::calc_ELBO: The number of dropped evaluations has reached its maximum amount (100). Your model may be either severely ill-conditioned or misspecified. Probably my model is not properly defined.

Do you have any idea how to define my Second Model? I am new in stan and don’t have much experience.

Thank you in advance!

A lot of your original text didn’t parse, so I’m not entirely sure what you’re after.

For your first model, I would suggest declaring Sigma as a cov_matrix to allow Stan to check for positive definiteness, then I would suggest Cholesky factoring it in a transformed_data block and then using multi_normal_cholesky. The reason is that multi_normal is cubic, whereas the Cholesky form is only quadratic in complexity. It’s unusual that people would know this prior for a regression—it’s much more usual to see independent priors or hierarchical priors.

For the second model, there is a more efficient way to compute. First, get rid of the transformed parameters and define as follow. I’m using inv_sigma instead of diag_array because it’s an inverse scale parameter—I would not suggest naming parameters solely after their shapes. inv_scale would also work, but that’s the name of the parameter type, not the name of a parameter, so I went with “inv_sigma” instead. I would suggest just having the user input the scale directly (sigma rather than inverse of sigma). The inv(x) function just computes 1 / x element wise more efficiently than 1 / x does itself.

sigma ~ gamma(a0, b0);  
beta ~ normal(mu, inv(inv_sigma));  // equivalent to beta ~ multi_normal(mu, inverse(diag_matrix(sigma)));

I would check that you really want to invert sigma here if you’re having issues.

1 Like

@Bob_Carpenter Thank you so much for the advices! They are very helpful.

For the second model, I removed the transformed parameters section, and I also modified my model as you suggested. The new model is as follows

data {
  int<lower=0> n; // number of observations
  int<lower=0> p; // number of parameters
  array[n] int<lower=0, upper=1> y; // binary outcome variable
  matrix[n, p] X; // matrix of predictor variables
  vector[p] mu; // prior mu vector
  real<lower=0> a0; // shape of(hyper)prior
  real<lower=0> b0; // rate of(hyper)prior
}


parameters {
  vector[p] beta; // coefficients for each predictor variable
  vector[p] inv_sigma; // vector for diagonal values of inverse Sigma array
}

model {
  for (j in 1:p) {
    inv_sigma[j] ~ gamma(a0, b0);
    beta[j] ~ normal(mu[j], inv(inv_sigma[j]));  // beta prior
  }

  y ~ bernoulli_logit(X * beta);  // observations
}

I also defined my model with an alternative way using the inv_gamma function:

parameters {
  vector[p] beta; // coefficients for each predictor variable
  vector[p] sigma; // vector for diagonal values of Sigma array
}

model {
  for (j in 1:p) {
    sigma[j] ~ inv_gamma(a0, b0);
    beta[j] ~ normal(mu[j], sigma[j]);  
  }
  
  y ~ bernoulli_logit(X * beta);  // likelihood
}

However, based on some trials, the model is working when the shape and rate (prior gamma distribution) are >= 7, but this is not always the case. I still get the same EROR: Chain 1 stan::variational::advi::calc_ELBO: The number of dropped evaluations has reached its maximum amount (100). Your model may be either severely ill-conditioned or misspecified.
Warning: Fitting finished unexpectedly! Use the $output() method for more information.

Error: Fitting failed. Unable to retrieve the draws..

I guess I need to figure out what is the problem with the smaller values.

Thank you again for your reply.