Halo Stan Users,
I have built a generalised linear model using a custom equation with one explanatory variable (ratio) and five parameters (level, scale, preference, choosiness and asymmetry). The dependent variable is binary and it consists of a series of zeros and ones. The model converged without warnings and it was cross-validated using the loo package. Therefore, the fitting process should be fairly accurate. What keeps me awake at night and frustrated during the day is the estimate of one single parameter, the preference parameter.
The equation I am using in the Stan model stems from the Gaussian function. It has the sum of level and scale as the parameter for the height of the curve’s peak, preference is the position of the centre of the peak, choosiness controls the width of the curve and asymmetry allows for a difference between the right and left tail. This is the implementation of the model in Stan.
data {
int<lower=0> N; //number of observations
vector[N] ratio; //explanatory variable
int<lower=0, upper=1> y[N]; //binary dependent variable
}
parameters {
real level;
real<lower=0, upper=20> scale;
real preference;
real<lower=0, upper=20> choosiness;
real asymmetry;
}
transformed parameters{
vector[N] y_hat;
for (i in 1:N)
y_hat[i] = level + scale * exp(-((ratio[i] - preference)^2)/(2 * (choosiness^2))) + asymmetry * ratio[i];
}
model {
level ~ normal(0, 10);
preference ~ normal(0, 10);
asymmetry ~ normal(0, 10);
y ~ bernoulli_logit(y_hat);
}
My expectation was to observe similar values between the mean of the parameter preference and the position on the x axis of the centre of the predicted probability peak. However, preference has mean equal to 0.1 and the peak of the curve has position 0.3 (see plot). Why is there a gap between these two estimates?
Link to the data snippet
R code used for the computations
dat = list(N = nrow(CZ_data), y = CZ_data$mountYN, ratio = CZ_data$size_ratio)
stanfit = stan(file = "stanscript.stan", data = dat, iter = 6000, warmup = 2000, chains=4,
refresh=6000, control = list(max_treedepth = 15))