I am trying to fit a linear excess risk model as mentioned here:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3139969/
Briefly, this is the model:
odds = exp(alpha)*(1+b1*x1+b2*x2+b3*x1*x2)
pr(y=1) = odds/(1+odds)
Stan model:
data{
int N; //sample size
int ncol; //number of columns of data
matrix[N,ncol] X; //matrix of data
int<lower=0, upper=1> y[N]; //outcome
}
parameters{
real a;
vector[ncol] b;
}
transformed parameters{
vector[N] xb1;
vector<lower=0>[N] odds; //odds cannot be negative
vector<lower=0,upper=1>[N] pr;
xb1 = X*b;
odds = exp(a)*(1+xb1);
for(n in 1:N){
pr[n] = odds[n]/(1+odds[n]);
}
}
model{
a ~ cauchy(0,2.5);
for(i in 1:ncol){
b[i] ~ cauchy(0,2.5);
}
y ~ bernoulli(pr);
}
generated quantities{
vector[N] yrep;
for(n in 1:N){
yrep[n] = bernoulli_rng(pr[n]);
}
}
set.seed(4507907)
N <- 800
x1 <- rpois(N,10)
x2 <- rbinom(N,1,0.5)
x1x2 <- x1*x2
b <- c(-3,2,2,1)
odds <- exp(b[1])*(1+ b[2]*x1 + b[3]*x2 + b[4]*x1x2)
p <- odds/(1+odds)
y <- rbinom(N,1,p)
table(y)
X = cbind(x1,x2,x1x2)
dat = list(N=N, X=X, y=y, ncol=ncol(X))
fit <- stan("linOdds.stan",
data=dat, control = list(adapt_delta=0.99, max_treedepth=14))
Tested it with a simulated data. However, many divergent transitions.
When I truncated the priors to be non-negative (see below).
parameters{
real a;
vector<lower=0>[ncol] b;
}
No divergent transitions, posterior looks okay.
Inference for Stan model: linOdds.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
a -3.05 0.03 0.85 -4.67 -3.61 -3.07 -2.48 -1.42 770 1.00
b[1] 3.38 0.26 5.56 0.34 1.19 2.20 3.94 11.95 474 1.01
b[2] 5.05 1.02 19.95 0.11 0.96 2.23 4.54 21.46 382 1.01
b[3] 1.57 0.06 1.85 0.06 0.49 1.05 2.00 6.02 1115 1.00
However, the pairs plots show some interesting shapes
Also there could be negative effects in real data examples.
The issue is the coefficients can be negative in real data.
So I guess my question is could any one suggest a reparameterisation?
To avoid divergent transitions?
Do I need to do something with the target distribution?
Thank you in advance.