Probit regression with varying block effect on the slope

I am trying to fit a probit model where each observation happened on a specific dose \{x_{1} \ldots x_m\} and for a specific block \{b_1 \ldots b_K\}. We also assume that each block has block-specific noise, \{blockNoise_1 \ldots blockNoise_K\} that affects the slope, where:

blockNoise_i \sim \mathcal{N}(0,blockSD)

The probit model would be simulated as follows:

y_{i,j} \sim \text{Bernoulli}( \Phi ( x_{dose_i}(1+blockNoise_{block_j}))

Where \Phi is the normal cumulative distribution function.

Given the \{y_{i,j}\}, I am trying to infer \text{block_SD}.

However, I have been really struggling to get this to work, things work fine computationally but I usually only get the mean of the prior 0.1 as an answer:

data {
  int<lower=0> N;
  int<lower=0> nBlocks;
  array[N] int blocks;
  array[N] int<lower=0, upper=1> y;
  array[N] real doses;
}

parameters {
  real<lower=0.01,upper=0.21> blockSD;
  vector[nBlocks] blockSDs;
}

transformed parameters {
  array[N] real probits;
  vector[N] p;
  for (i in 1:N) {
    probits[i] = doses[i]*(blockSDs[blocks[i]] + 1);
    p[i] = Phi(probits[i]);
  }
}

model {

  blockSD ~ normal(0.1,0.05);
  blockSDs ~ normal(0,blockSD);
  y ~ bernoulli(p);
}

(for the model to work reliably the parameters need to be non-centered but I kept it like this for simplicity)

I have wondered if it’s the data that I am generating - in that there is just not enough information to infer anything. Below is an example (although I use more replicates and blocks and the result is similar). In the figure below, on top is the blockSD=0.001 (i.e the slope in the model does not change much between blocks), but it is higher on the bottom (0.2):

and I also plot the probits:

(obviously there is difference in the variation in the slopes).

What am I missing? I think the problem is in the way that I am specifying the model but I am not sure what the issue is, it seems like it’s not sampling things correctly.

It’s hard to fit a hierarchical model over scale parameters because it adds a second degree of freedom in accounting for noise (you can either adjust the mean or the standard deviation to better match the errors).

There are a few problems with the model as written.

  1. The variable blockSDs needs to be declared with the constraint vector<lower=0>[nBlocks]. But maybe not, because they don’t look like standard deviations in the model—they look like regression coefficients.

  2. You probably want a non-centered parameterization of the hierarchical model, which is harder. You can do that by using unconstrained representations for the log scale parameters (conventionally the parameter is called “scale” and for a normal, it determines the standard deviation).

  3. I would suggest eliminating the transformed parameters block (to avoid saving all the intermediate values) and just defining the expected values directly within the model block. You’ll need to declare doses as a vector[N] to support matrix/vector operations.

model {
  ...
  y ~ bernoulli(Phi(doses .* (blockSDs[blocks] + 1)));
}

It’s even more efficient to work with logistic regression, where this is just

y ~ bernoulli_logit(doses .* (blockSDs[blocks] + 1));
  1. I would suggest replacing constraint on blockSD with just <lower=0>. Range constraints rarely do what people intend in Bayesian inference—if the posterior mean is within the interval, it can still cause substantial bias with respect to a wider interval prior. Again, you can diagnose by seeing if any of the values bunch up near the constraint boundary—that will also cause computation to struggle.

I didn’t understand why you’re adding 1 to the blockSDs—why not just let that be free so that the blockSD adjusts. You can diagnose whether this is a problem if the blockSDs components bunch near zero. But then I’m not sure if these should really be called SDs. In the text, you called them blockhouse. Did you mean to generate the block noise from a centered normal distribution with the block standard deviations?

1 Like

Hi @Bob_Carpenter, thank you so much for your response. There are a couple of efficiency/convergence points which are very good: non-centered (which I did before) but also the logistic regression, which I will implement. For the purposes of showing the model I just wanted to keep it simple in the spirit of having people understand it quickly. To your other points:

The variable blockSDs needs to be declared with the constraint vector<lower=0>[nBlocks]. But maybe not, because they don’t look like standard deviations in the model—they look like regression coefficients…I didn’t understand why you’re adding 1 to the blockSDs—why not just let that be free so that the blockSD adjusts.

This is because I intend the model to have a slope of 1, and the blockSDs to be block-specific deviations from the slope (I called it blockNoise), so it’s fine if they are negative: In some blocks the slope would be less than one, and in others greater than one, and this variation is governed by blockSD. I think I could have also specified this as - changing the name to blockDeviation instead of blockSDs:

blockDeviation \sim \mathcal{N}(1,blockSD)

where

blockSD \sim \mathcal{N}(0.1,0.05)

And this is the plot of blockSD vs blockSD[1] (i.e. *blockDeviation[1]", deviation from the slope of 1 on the first block)

(isn’t this pretty!!)

And it makes sense, as the blockSD gets smaller, the block deviations from the slope would also get smaller.

I still feel like I am missing something obvious that is making this not do the inference properly and just going towards the prior, no matter how much data there is! Is there something unidentifiable here?

Just reimplemented your suggestions here (runs at least twice as fast now!!):

data {
  int<lower=0> N;
  int<lower=0> nBlocks;
  array[N] int blocks;
  array[N] int<lower=0, upper=1> y;
  vector[N] doses;
}

parameters {
  real<lower=(0.01-0.1)/0.05> blockSD_raw;
  vector[nBlocks] blockDeviation_raw;
}

transformed parameters {
  real<lower=0> blockSD;
  vector[nBlocks] blockDeviation;

  blockSD = 0.1+blockSD_raw*0.05;

  for (i in 1:nBlocks) {
    blockDeviation = blockDeviation_raw*blockSD;
  }
}

model {
  blockSD_raw ~ normal(0,1);
  blockDeviation_raw ~ normal(0,1);
  y ~ bernoulli_logit(doses .* (1+blockDeviation[blocks]));
}

Again, thank you very much for your insight and suggestions!

Solved my own question ,there was an issue in the data generating process that I had missed. Thank you @Bob_Carpenter for all your help!