 # Principled Bayesian Workflow problems

Hi there,
I’m working on a mixed model for risky decision-making and came across this life-saving (blog)[Towards A Principled Bayesian Workflow] by @betanalpha. I decided to implement the steps, and the result of step 10 (algorithm calibration) and step 11 (inferential calibration) using simulated data from the prior distribution seems very problematic. I couldn’t quite figure out where’s the problem, any help / insight would be greatly appreciated!

In each trial the subject is asked to choose between a lottery with known probability + payout magnitude and a guaranteed sure-bet. The model takes the probability of lottery P_L, the payout magnitude of lottery V_L, the magnitude of surebet V_{SB} from each trial, and estimates the subject risk preference \rho, internal noise \sigma_v and the mixing proportion of 3 agents: rational, pure lottery, pure sure-bet (\omega_1, \omega_3, \omega_3). Trial-history effects are also considered (\omega_{win}, \omega_{lose}).
\mu_{L} = P_L V_L^{\rho} \\ \mu_{SB} = V_{SB}^{\rho}\\ \hat{V}_L = \mathrm{normal}(\mu_L,\sigma_v ^ {\rho})\\ \hat{V}_{SB} = \mathrm{normal}(\mu_{SB},\sigma_v ^ {\rho})\\ P_{rational} = P(\hat V _L > \hat V _{SB}) \\ \vec{P}_{rational, lottery, surebet} = [P_{rational}, 1, 0] \\ P(\mathrm{Choose\ L}) = \begin{cases} \text{if win}, \vec{\omega}_{win} \cdot \vec{P} \\ \text{if lose}, \vec{\omega}_{lose} \cdot \vec{P} \\ \text{if surebet}, \vec{\omega} \cdot \vec{P} \\ \end{cases}

Here is the actual posterior model:

functions {
real choiceprob(real rho, real sigma_v, real lottery_mag, real lottery_prob, real sb_mag, real rew_multi) {
real y;
real u1;	// Lottery utility
real u2;	// Surebet utility

u1 = lottery_prob * (rew_multi * lottery_mag) ^ rho;
u2 = (rew_multi * sb_mag) ^ rho;
y = 1 - normal_cdf(0, u1 - u2, sqrt(2) * sigma_v ^ rho);
return y;
}
}
data {
int<lower=0> T; // Number of trials we have
vector[T] lottery_mag; // Lottery value for that choice
vector[T] lottery_prob; // Lottery probabilities for that choice
vector[T] sb_mag; // Surebet values for that choice
vector[T] total_rew_multi; // total reward multiplier = base_reward * rew_multi
vector[T] prev_outcome; // -1 = lose, 1 = win, 0 = surebet
int n_chose_lott[T]; // number chose lottery for this trial type
int n_trials[T]; // total number of trials for this trial type
}
parameters {
real rho_raw; // risk preference
real sigmav_raw; // internal noise
vector omega_raw; // baseline fractions of being rational, lottery and surebet agent
vector post_win_raw; // changes to omega baseline after a win
vector post_lose_raw; // changes to omega baseline after a lose
}
transformed parameters{
real<lower=0> rho;
real<lower=0> sigmav;
vector omega;
vector omega_win;
vector omega_lose;
rho = exp(rho_raw);
sigmav = exp(sigmav_raw);
omega = softmax(omega_raw);
omega_win = softmax(omega_raw + post_win_raw*0.1);
omega_lose = softmax(omega_raw + post_lose_raw*0.1);
}
model {
vector agent; // initialize an agent vector
real p_rational;
real p_chose_lott;

rho_raw ~ normal(log(0.8), 0.15);
sigmav_raw ~ normal(log(3), 0.5);
omega_raw ~ normal(1, 1);
omega_raw ~ normal(0, 1);
omega_raw ~ normal(0, 1);
post_win_raw ~ std_normal();
post_lose_raw ~ std_normal();

for(t in 1:T){
p_rational = choiceprob(rho, sigmav, lottery_mag[t], lottery_prob[t], sb_mag[t], total_rew_multi[t]);
agent = p_rational; // rational agent
agent = 1; // lottery agent
agent = 0; // surebet agent
if (prev_outcome[t] == 1){ // lottery win
p_chose_lott = dot_product(agent, omega_win);
} else if (prev_outcome[t] == -1){ // lottery lose
p_chose_lott = dot_product(agent, omega_lose);
}  else if (prev_outcome[t] == 0){ // surebet
p_chose_lott = dot_product(agent, omega);
}
n_chose_lott[t] ~ binomial(n_trials[t], p_chose_lott);
}
}


Here is the Stan script used for ensemble simulation from the priors:

functions {
real choiceprob(real rho, real sigma_v, real lottery_mag, real lottery_prob, real sb_mag, real rew_multi) {
real y;
real u1;	// Lottery utility
real u2;	// Surebet utility

u1 = lottery_prob * (rew_multi * lottery_mag) ^ rho;
u2 = (rew_multi * sb_mag) ^ rho;
y = 1 - normal_cdf(0, u1 - u2, sqrt(2) * sigma_v ^ rho);
return y;
}
}
data {
int<lower=0> T; // Number of trials we have
vector[T] lottery_mag; // Lottery value for that choice
vector[T] lottery_prob; // Lottery probabilities for that choice
vector[T] sb_mag; // Surebet values for that choice
vector[T] total_rew_multi; // total reward multiplier = base_reward * rew_multi
vector[T] prev_outcome; // -1 = lose, 1 = win, 0 = surebet
int n_trials[T]; // total number of trials for this trial type
}
generated quantities {
// Simulate model configuration from prior model
// declare outcome
vector[T] n_chose_lott;
// declare parameters
real<lower=0> rho;
real<lower=0> sigmav;
vector omega_raw;
vector post_win_raw;
vector post_lose_raw;
// declare transform parameters
vector omega;
vector omega_win;
vector omega_lose;
// declare temporary variables
vector agent; // initialize an agent vector
real p_rational;
real p_chose_lott;

// simulate hyperpriors
rho = exp(normal_rng(log(0.8), 0.15));
sigmav = exp(normal_rng(log(3), 0.5));
omega_raw = normal_rng(1, 1);
omega_raw = normal_rng(0, 1);
omega_raw = normal_rng(0, 1);
for (i in 1:3){
post_win_raw[i] = normal_rng(0, 1);
post_lose_raw[i] = normal_rng(0, 1);
}
// transform simulated omega hyperpriors
omega = softmax(omega_raw);
omega_win = softmax(omega_raw + post_win_raw*0.1);
omega_lose = softmax(omega_raw + post_lose_raw*0.1);
// Simulate data from observational model
for(t in 1:T){
p_rational = choiceprob(rho, sigmav, lottery_mag[t], lottery_prob[t], sb_mag[t], total_rew_multi[t]);
agent = p_rational; // rational agent
agent = 1; // lottery agent
agent = 0; // surebet agent
if (prev_outcome[t] == 1){ // lottery win
p_chose_lott = dot_product(agent, omega_win);
} else if (prev_outcome[t] == -1){ // lottery lose
p_chose_lott = dot_product(agent, omega_lose);
}  else if (prev_outcome[t] == 0){ // surebet
p_chose_lott = dot_product(agent, omega);
}
n_chose_lott[t] = binomial_rng(n_trials[t], p_chose_lott);
}
}


Here’s the diagnostic plots from the Principled Bayesian Workflow, I only fitted the posterior model 100 times but the overall problem is clear: The prior ranks show that the sampling is underdispersed, and the posterior sensitivity plot shows that posterior z-score is really bad. I tried changing priors wider and narrower, it didn’t seem to help much. @betanalpha If you have any thoughts on how to improve the model (eventually we want to fit a HBM, but now even the simple one is problematic…)… any idea would be great! Many thanks!

3 Likes

Great, you are following the best practice!

One thing that looks immediately suspicious is the way you code omega - this is supposed to be a simplex, i.e. there are only two degrees of freedom (the probabilities have to sum to 1). But you code it with 3 degrees of freedom, making the model unidentified (whenever softmax(omega_raw) == x, then for any real number b you also get softmax(omeg_raw + b) == x, so you have infinitely many combinations of omega_raw that result in the same value of omega).

Using simplex omega should help. Alternatively, you could have vector omega_raw; and then have vector omega = softmax(append_row(0, omega_raw)); i.e. fix the first element to 0 - this would give you a more direct way to code for the changes after win/loss (this approach is similar to how categorical regression is typically coded - look it up if you are not familiar with it).

Also the model is quite big to debug fully (at least for me). One thing we argue for in the Bayesian workflow preprint (which has huge overlap with and is inspired in a lot of aspects by Mike’s work) is that when you encounter problems, you simplify the model (e.g. drop the changes after win/loss; first try to fit a model with only the rational component and avoiding the mixture part; then fit a model that only has the mixture between lottery and surebet and only when you can make both of those work fit a model that combines both).

Best of luck with the model.

6 Likes

Those horns in the rank histograms probably indicate substantial autocorrelation in the samples. Can you check the diagnostics in those runs?

Yes these are great advice! The constructing-from-base-models-and-up approach is sometimes forgotten during actual practice. Thanks again.