Principled Bayesian Workflow problems

Hi there,
I’m working on a mixed model for risky decision-making and came across this life-saving (blog)[Towards A Principled Bayesian Workflow] by @betanalpha. I decided to implement the steps, and the result of step 10 (algorithm calibration) and step 11 (inferential calibration) using simulated data from the prior distribution seems very problematic. I couldn’t quite figure out where’s the problem, any help / insight would be greatly appreciated!


In each trial the subject is asked to choose between a lottery with known probability + payout magnitude and a guaranteed sure-bet. The model takes the probability of lottery P_L, the payout magnitude of lottery V_L, the magnitude of surebet V_{SB} from each trial, and estimates the subject risk preference \rho, internal noise \sigma_v and the mixing proportion of 3 agents: rational, pure lottery, pure sure-bet (\omega_1, \omega_3, \omega_3). Trial-history effects are also considered (\omega_{win}, \omega_{lose}).
\mu_{L} = P_L V_L^{\rho} \\ \mu_{SB} = V_{SB}^{\rho}\\ \hat{V}_L = \mathrm{normal}(\mu_L,\sigma_v ^ {\rho})\\ \hat{V}_{SB} = \mathrm{normal}(\mu_{SB},\sigma_v ^ {\rho})\\ P_{rational} = P(\hat V _L > \hat V _{SB}) \\ \vec{P}_{rational, lottery, surebet} = [P_{rational}, 1, 0] \\ P(\mathrm{Choose\ L}) = \begin{cases} \text{if win}, \vec{\omega}_{win} \cdot \vec{P} \\ \text{if lose}, \vec{\omega}_{lose} \cdot \vec{P} \\ \text{if surebet}, \vec{\omega} \cdot \vec{P} \\ \end{cases}


Here is the actual posterior model:

functions {
  real choiceprob(real rho, real sigma_v, real lottery_mag, real lottery_prob, real sb_mag, real rew_multi) {
    real y;
    real u1;	// Lottery utility
    real u2;	// Surebet utility
    
    u1 = lottery_prob * (rew_multi * lottery_mag) ^ rho;
    u2 = (rew_multi * sb_mag) ^ rho;
    y = 1 - normal_cdf(0, u1 - u2, sqrt(2) * sigma_v ^ rho);
    return y;
  }
}
data {
  int<lower=0> T; // Number of trials we have
  vector[T] lottery_mag; // Lottery value for that choice
  vector[T] lottery_prob; // Lottery probabilities for that choice
  vector[T] sb_mag; // Surebet values for that choice
  vector[T] total_rew_multi; // total reward multiplier = base_reward * rew_multi
  vector[T] prev_outcome; // -1 = lose, 1 = win, 0 = surebet
  int n_chose_lott[T]; // number chose lottery for this trial type
  int n_trials[T]; // total number of trials for this trial type
}
parameters {
  real rho_raw; // risk preference
  real sigmav_raw; // internal noise
  vector[3] omega_raw; // baseline fractions of being rational, lottery and surebet agent
  vector[3] post_win_raw; // changes to omega baseline after a win
  vector[3] post_lose_raw; // changes to omega baseline after a lose
}
transformed parameters{
  real<lower=0> rho;
  real<lower=0> sigmav;
  vector[3] omega;
  vector[3] omega_win;
  vector[3] omega_lose;
  rho = exp(rho_raw);
  sigmav = exp(sigmav_raw);
  omega = softmax(omega_raw);
  omega_win = softmax(omega_raw + post_win_raw*0.1);
  omega_lose = softmax(omega_raw + post_lose_raw*0.1);
}
model {
  vector[3] agent; // initialize an agent vector 
  real p_rational; 
  real p_chose_lott; 
  
  rho_raw ~ normal(log(0.8), 0.15);
  sigmav_raw ~ normal(log(3), 0.5);
  omega_raw[1] ~ normal(1, 1);
  omega_raw[2] ~ normal(0, 1);
  omega_raw[3] ~ normal(0, 1);
  post_win_raw ~ std_normal();
  post_lose_raw ~ std_normal();
  
  for(t in 1:T){
    p_rational = choiceprob(rho, sigmav, lottery_mag[t], lottery_prob[t], sb_mag[t], total_rew_multi[t]);
    agent[1] = p_rational; // rational agent
    agent[2] = 1; // lottery agent
    agent[3] = 0; // surebet agent
    if (prev_outcome[t] == 1){ // lottery win
    p_chose_lott = dot_product(agent, omega_win);
    } else if (prev_outcome[t] == -1){ // lottery lose
    p_chose_lott = dot_product(agent, omega_lose);
    }  else if (prev_outcome[t] == 0){ // surebet
    p_chose_lott = dot_product(agent, omega);
    }
    n_chose_lott[t] ~ binomial(n_trials[t], p_chose_lott);
  }
}

Here is the Stan script used for ensemble simulation from the priors:

functions {
  real choiceprob(real rho, real sigma_v, real lottery_mag, real lottery_prob, real sb_mag, real rew_multi) {
    real y;
    real u1;	// Lottery utility
    real u2;	// Surebet utility
    
    u1 = lottery_prob * (rew_multi * lottery_mag) ^ rho;
    u2 = (rew_multi * sb_mag) ^ rho;
    y = 1 - normal_cdf(0, u1 - u2, sqrt(2) * sigma_v ^ rho);
    return y;
  }
}
data {
  int<lower=0> T; // Number of trials we have
  vector[T] lottery_mag; // Lottery value for that choice
  vector[T] lottery_prob; // Lottery probabilities for that choice
  vector[T] sb_mag; // Surebet values for that choice
  vector[T] total_rew_multi; // total reward multiplier = base_reward * rew_multi
  vector[T] prev_outcome; // -1 = lose, 1 = win, 0 = surebet
  int n_trials[T]; // total number of trials for this trial type
}
generated quantities {
  // Simulate model configuration from prior model
  // declare outcome
  vector[T] n_chose_lott;
  // declare parameters
  real<lower=0> rho;
  real<lower=0> sigmav;
  vector[3] omega_raw;
  vector[3] post_win_raw;
  vector[3] post_lose_raw;
  // declare transform parameters
  vector[3] omega;
  vector[3] omega_win;
  vector[3] omega_lose;
  // declare temporary variables
  vector[3] agent; // initialize an agent vector 
  real p_rational; 
  real p_chose_lott; 
  
  // simulate hyperpriors 
  rho = exp(normal_rng(log(0.8), 0.15));
  sigmav = exp(normal_rng(log(3), 0.5));
  omega_raw[1] = normal_rng(1, 1);
  omega_raw[2] = normal_rng(0, 1);
  omega_raw[3] = normal_rng(0, 1);
  for (i in 1:3){
    post_win_raw[i] = normal_rng(0, 1);
    post_lose_raw[i] = normal_rng(0, 1);
  }
  // transform simulated omega hyperpriors
  omega = softmax(omega_raw);
  omega_win = softmax(omega_raw + post_win_raw*0.1);
  omega_lose = softmax(omega_raw + post_lose_raw*0.1);
  // Simulate data from observational model
  for(t in 1:T){
    p_rational = choiceprob(rho, sigmav, lottery_mag[t], lottery_prob[t], sb_mag[t], total_rew_multi[t]);
    agent[1] = p_rational; // rational agent
    agent[2] = 1; // lottery agent
    agent[3] = 0; // surebet agent
    if (prev_outcome[t] == 1){ // lottery win
    p_chose_lott = dot_product(agent, omega_win);
    } else if (prev_outcome[t] == -1){ // lottery lose
    p_chose_lott = dot_product(agent, omega_lose);
    }  else if (prev_outcome[t] == 0){ // surebet
    p_chose_lott = dot_product(agent, omega);
    }
    n_chose_lott[t] = binomial_rng(n_trials[t], p_chose_lott);
  }
}

Here’s the diagnostic plots from the Principled Bayesian Workflow, I only fitted the posterior model 100 times but the overall problem is clear: The prior ranks show that the sampling is underdispersed, and the posterior sensitivity plot shows that posterior z-score is really bad. I tried changing priors wider and narrower, it didn’t seem to help much. @betanalpha If you have any thoughts on how to improve the model (eventually we want to fit a HBM, but now even the simple one is problematic…)… any idea would be great! Many thanks!

4 Likes

Great, you are following the best practice!

One thing that looks immediately suspicious is the way you code omega - this is supposed to be a simplex, i.e. there are only two degrees of freedom (the probabilities have to sum to 1). But you code it with 3 degrees of freedom, making the model unidentified (whenever softmax(omega_raw) == x, then for any real number b you also get softmax(omeg_raw + b) == x, so you have infinitely many combinations of omega_raw that result in the same value of omega).

Using simplex[3] omega should help. Alternatively, you could have vector[2] omega_raw; and then have vector[3] omega = softmax(append_row(0, omega_raw)); i.e. fix the first element to 0 - this would give you a more direct way to code for the changes after win/loss (this approach is similar to how categorical regression is typically coded - look it up if you are not familiar with it).

Also the model is quite big to debug fully (at least for me). One thing we argue for in the Bayesian workflow preprint (which has huge overlap with and is inspired in a lot of aspects by Mike’s work) is that when you encounter problems, you simplify the model (e.g. drop the changes after win/loss; first try to fit a model with only the rational component and avoiding the mixture part; then fit a model that only has the mixture between lottery and surebet and only when you can make both of those work fit a model that combines both).

Best of luck with the model.

6 Likes

Those horns in the rank histograms probably indicate substantial autocorrelation in the samples. Can you check the diagnostics in those runs?

1 Like

Yes these are great advice! The constructing-from-base-models-and-up approach is sometimes forgotten during actual practice. Thanks again.

As @maxbiostat notes the rank horns could be from autocorrelation. You’ll need to figure out the right thinning for your chains which could be very different from the thinning I used in my workflow example. It will also help to follow up on any diagnostic problems reported by Stan in any of those fits as they can more directly identify the source of the problem. If you have enough observations then you might need to consider centering some of the hierarchal parameters – see Hierarchical Modeling for much more.

The posterior z-scores are definitely worrisome, especially given that they occur only when the posterior contraction pushes up against 1. This implies that many of the recovered posteriors are strongly concentrating around random points; such behavior would be troublesome if we trusted our computation but in this case I’m guessing that it’s actually a computation problem that has not yet been resolved.

Investigate the problematic fits and see if the Markov chains are freezing at a single point because of poor adaptation during warmup (you can also confirm by looking at the sampler parameters to see what the stepwise and inverse metric elements are). Stuck chains would also feature extremely high autocorrelations which would explain the SBC results.

4 Likes

Thanks for the reply! It turns out the model was actually correct but there was a bug in my Algorithm calibration step, where I drew samples from simulation ensemble randomly twice, and compare the fitted result using the first draw against the random second draw…(this explains extremely large posterior z-scores). Here’s an updated version of the workflow plots:

6 Likes