Hierarchal logistic regression non-centered reparameterization

js592 · May 23, 2019, 9:02pm

Hi all,

I am performing a hierarchal logistic regression on a relatively sparse dataset – 278 observations grouped into 18 events (with a few events having < 5 data points). Per recommendations on this forum I have implemented the “Matt’s trick” non-centered reparamterization. My code is below (it should be correct, the model runs fine and produces posterior distributions close to what I expect).

data {
  int <lower = 1> N; //number of data points
  int <lower = 1> E; //number of events
  int <lower = 1, upper = E>  event[N]; 
  real x1[N]; 
  real x2 [N];
  int<lower=0,upper=1> y[N];
    }
parameters {
  vector[E] b0_raw; // event level adjustments
  vector[E] b1_raw;
  vector[E] b2_raw;
  real <lower = 0> sigma_b0; 
  real <lower = 0> sigma_b1;
  real <lower = 0> sigma_b2;
  real mu_b0;
  real mu_b1;
  real mu_b2;
}
transformed parameters{
  vector[E] b0;
  vector[E] b1;
  vector[E] b2;
  b0 = mu_b0+ sigma_b0 * b0_raw;
  b1 = mu_b1 + sigma_b1 * b1_raw;
  b2 = mu_b2 + sigma_b2 * b2_raw;
}
model {
      mu_b0 ~ normal(0,100); //priors for coefficents
      mu_b1 ~ normal(0,100); //diffuse normals
      mu_b2 ~ normal(0,100);
      sigma_b0 ~ normal(0,25);
      sigma_b1 ~ normal(0,25);
      sigma_b2 ~ normal(0,25);
      b0_raw ~ normal(0,1);
      b1_raw ~ normal(0,1);
      b2_raw ~ normal(0,1);
      for (i in 1:N) {
        liq[i] ~ bernoulli_logit(b0[event[i]] + b1[event[i]]* x1[i] + b2[event[i]] * x2[i]);
      }

My under standing from the Betancourt and Girolami paper is that the re-parameterization results from the equivalency between these two statements:

y_i \sim N(\theta_i, \sigma_i) with \theta_i \sim N(\mu, \tau)
and
y_i \sim N(\theta^*_i*\tau+\mu, \sigma_i) with \theta^*_i \sim N(0,1).

My questions are as follows: What would the equivalent statements be for a hierarchal logistic regression and why are they valid?

If they aren’t, have I made some faulty assumption and coded my model incorrectly?

Thanks!

jonah · May 23, 2019, 11:12pm

I haven’t looked at your Stan program yet, but if you have a varying parameter with a level for each observation (e.g. the eight_schools example but not Gaussian) then mathematically you can write something like y_i \sim {\rm Bernoulli}({\rm logit}^{-1}(\theta_i^\star * \tau+\mu))

jonah · May 23, 2019, 11:21pm

They are valid for the same reason as when you are using a Gaussian likelihood. The relevant Gaussian for the non-centered parameterization is the Gaussian prior on \theta. In other words, it’s always true that

\theta \sim N(\mu, \tau)

is equivalent to

\theta = \theta^\star * \tau + \mu, \theta^\star \sim N(0,1)

regardless of the distribution of y. Hope that helps, and welcome to the Stan forums.

jonah · May 24, 2019, 12:04am

I didn’t go through it thoroughly but here are a few things you can tweak to make your Stan program more concise and also run faster:

data {
...
  // vectors not arrays so you can do elementwise vector multiplication in model block 
  vector[N] x1; 
  vector[N] x2; 
...
}

transformed parameters{
  // same but declare and define on a single line (just cosmetic)
  vector[E] b0 = mu_b0 + sigma_b0 * b0_raw;
  vector[E] b1 = mu_b1 + sigma_b1 * b1_raw;
  vector[E] b2 = mu_b2 + sigma_b2 * b2_raw;
}

model {
  // vectorize the log likelihood (will be faster than loop)
  y ~ bernoulli_logit(b0[event] + b1[event] .* x1 + b2[event] .* x2)
}

I would also suggest tighter priors than normal(0,100) (that’s sd=100, variance = 10000) unless it’s really the case that values like say -250 and 250 are plausible. Especially important to think about if your data is sparse.

js592 · May 24, 2019, 1:01am

Ok, thanks for your advice! I’ve actually been struggling a bit with the priors – mostly just because they’re going to be something the peer reviewers (who aren’t Bayesian modelers) are going to be picky about. I’ve been trying to reason through how best to determine an anticipated scale similar to how Gelman et al., 2008 talked about creating a prior that assigns low probabilities to anything greater than a change of 10 on the logistic scale corresponding to unit change in predictor variable. Do you have any suggestions for this?

jonah · May 24, 2019, 1:29am

Maybe this has something

https://mc-stan.org/users/documentation/case-studies/weakly_informative_shapes.html

(and it’s a good read even if it doesn’t!)

I also suggest simulating data like yours in relevant ways, and then fitting the model to that data using priors of different strengths so you can get a sense for what affect that has.

Topic		Replies	Views
Non-centered parameterization for non-hierarchical parameters? Modeling reparametrization	15	1376	May 6, 2021
Reparameterizing a non-centered hierarchical logistic multiple regression model to avoid divergences in HMC Modeling	4	305	March 6, 2023
Reparameterization ("Matt trick") failing to produce a speed improvement in hierarchical model Modeling techniques , cognitive-science	2	1277	June 12, 2018
Issues with non-centered specification (Matt trick) Modeling fitting-issues	0	461	January 7, 2019
Problems with non-centered variance parameters (Matt tick) Modeling performance	0	340	January 6, 2019

Hierarchal logistic regression non-centered reparameterization

Related topics