Hi,
The challenge: divergent transitions for nearly all posterior simulations.
Two main questions:
- Is this less of a concern for my purposes?
I am estimating parameters for individual experimental subjects, and I am then comparing these parameters (e.g., including them in regression models). I am more interested in the difference between peoples parameters rather than that the posterior of their parameters needing to be precisely simulated.
- In my code/model below, can you see any obvious issues I’ve missed that can be addressed to reduce divergent transitions?
I have already attempted changing the hierarchical model (e.g., the type of distributions, priors), achieving one that makes sense conceptually and has a Rhat near 1 running 4 chains (which I’m happy about). I’ve also done a few tries at varying iterations, varying adapt_delta, varying init_r, and a simulation (further systematic testing TBC), however, divergence persists to date. Otherwise, no errors msg for bulk ESS but too high tail ESS.
Further background
I am simulating posteriors for parameters that feature in a choice-model: I fit an equation that predicts the probability of choosing option A over option B (i.e. binary choice).
My RStan model has three main parameters for each individual and six parameters that describe the group-level priors (mean and sd).
The choices are for
- A: a lottery or gamble that pays $0 (low) or $X (high) with some probability, or
- B: a sure $amount.
The three main parameters are: Predi, RewExp and noise. The first two are of main interest as they form the key parameters in my choice model:
utility(option A) =
p($0) * ($0^Predi / ($0^Predi + RewExp^Predi) ) +
p($X) * ($X^Predi / ($X^Predi + RewExp^Predi) )
utility(option B) =
p($sure) * ($sure^Predi / ($sure^Predi + RewExp^Predi) )
As a result, there are a high number of parameters: 6 group level + 3 for each subject - we have 443 subjects so the total is 6 + 3 * 443 = 1336.
Start Stan code/model
// The variables below are created and filled by the data as lists
data {
int N; // number of trials total (across participants), integer
int nsubj; // number of subjects,
int choices[N]; // the binary choice vector
real <lower=0> x1a[N];
real <lower=0> x2a[N];
real <lower=0> x1b[N];
real <lower=0> p1a[N];
real <lower=0> p2a[N];
real <lower=0> p1b[N];
int sid[N];
}
// There are three main parameters: noise, Predi, and RewExp
parameters {
// group-level/priors
real<lower=0> meannoise;
real<lower=0> sdnoise;
real<lower=0.1> meanPredi;
real<lower=0> sdPredi;
real<lower=0.01> meanRewExp;
real<lower=0> sdRewExp;
// individual-level parameters
real<lower=0.1> noise[nsubj];
real<lower=0.1> Predi[nsubj];
real<lower=0.01> RewExp[nsubj];
}
model {
real ua; // utility of the option a
real ub; // utility of the option b
// Group-level parameter distributions
meannoise ~ uniform(-2.30,1.61);
sdnoise ~ uniform(0,1.13);
meanPredi ~ uniform(-2.30,2.3);
sdPredi ~ uniform(0,2.86);
meanRewExp ~ uniform(-2.30,2.3);
// Hierarchy - this links the group/prior parameters to the individual ones.
noise ~ lognormal(meannoise, sdnoise);
Predi ~ lognormal(meanPredi, sdPredi);
RewExp ~ lognormal(meanRewExp, sdRewExp);
for (i in 1:N) {
// calcualte the utility of the gamble and the alternative
// then estimate the likelihood of choosing option A over B in each trial
int t = sid[i];
ua=0.0;
ua += p1a[i]*((x1a[i]^Predi[t])/((x1a[i]^Predi[t])+(RewExp[t]^Predi[t])));
ua += p2a[i]*((x2a[i]^Predi[t])/((x2a[i]^Predi[t])+(RewExp[t]^Predi[t])));
ub=0.0;
ub += p1b[i]*((x1b[i]^Predi[t])/((x1b[i]^Predi[t])+(RewExp[t]^Predi[t])));
choices[i] ~ bernoulli_logit((ua-ub)/noise[t]); // this is logistic step
}
}
End Stan code
// note
- x1a and p1a are the low outcome and associated probability for option A;
- x2a and p2a the high outcome and probability for option A; and
- x1b and p1b the sure outcomes of option B.
Below I also include the R code I use for executing the Stan model above (called ‘ESVT_AS_2023_v7.stan’):
Start of R code
library(readr)
library(readxl)
library(tidyr)
library(ggplot2)
library(rmarkdown)
library(tidyverse)
library(writexl)
library(rstan)
library(bayesplot)
sample <- read_xlsx("<my excel data set>")
setwd("<my folder>")
### extract the variables from the "sample"
N=nrow(sample)
nsubj=max(sample['sid'])
subjs=unlist(sample['sid'])
choices=unlist(sample['chosea'])
x1a=unlist(sample['x1a'])
x2a=unlist(sample['x2a'])
x1b=unlist(sample['x1b'])
p1a=unlist(sample['p1a'])
p2a=unlist(sample['p2a'])
p1b=unlist(sample['p1b'])
sid=unlist(sample['sid'])
VarList <- list(N=N,nsubj=nsubj,subjs=subjs,choices=choices,x1a=x1a,x2a=x2a,x1b=x1b,p1a=p1a,p2a=p2a, p1b=p1b, sid=sid)
ESVTfit_v7 <- stan(file='ESVT_AS_2023_v7.stan',
data=list(N=N,nsubj=nsubj,subjs=subjs,choices=choices,x1a=x1a,x2a=x2a,x1b=x1b,p1a=p1a,p2a=p2a, p1b=p1b,sid=sid),
warmup=5000,
chains=4,
iter=50000,
init_r=1,
verbose=FALSE)
print(ESVTfit_v7)
ESVTfit_v7_ss <- extract(ESVTfit_v7,permuted=TRUE)
write.table(as.data.frame(ESVTfit_v7_ss),file="ESVT_2023_v7_bhm.csv",quote=F,sep=",",row.names=F)
traceplot(ESVTfit_v7, pars = c("meannoise", "meanPredi", "meanRewExp")) ## too look at convergence of mean parameters
End of R code
Your thoughts and assistance are much appreciated.
Cheers,
Alex
P.S. As a further aid in understanding the question - I would like to attached a fake, simulated, data set of individuals and choice - this is the type of dataset read in by ‘read_xlsx’ above - but I don’t seem to be able to.
EDIT by Aki: added backticks to show code in code blocks.