Incoherence between mean parameter value and position of the centre predicted probability peak

Halo Stan Users,

I have built a generalised linear model using a custom equation with one explanatory variable (ratio) and five parameters (level, scale, preference, choosiness and asymmetry). The dependent variable is binary and it consists of a series of zeros and ones. The model converged without warnings and it was cross-validated using the loo package. Therefore, the fitting process should be fairly accurate. What keeps me awake at night and frustrated during the day is the estimate of one single parameter, the preference parameter.
The equation I am using in the Stan model stems from the Gaussian function. It has the sum of level and scale as the parameter for the height of the curve’s peak, preference is the position of the centre of the peak, choosiness controls the width of the curve and asymmetry allows for a difference between the right and left tail. This is the implementation of the model in Stan.

data {
  int<lower=0> N;    //number of observations
  vector[N] ratio;    //explanatory variable
  int<lower=0, upper=1> y[N];    //binary dependent variable

parameters {
  real level;
  real<lower=0, upper=20> scale;
  real preference;
  real<lower=0, upper=20> choosiness;
  real asymmetry;

transformed parameters{
  vector[N] y_hat;
  for (i in 1:N)
    y_hat[i] = level + scale * exp(-((ratio[i] - preference)^2)/(2 * (choosiness^2))) + asymmetry * ratio[i];

model {
  level ~ normal(0, 10);
  preference ~ normal(0, 10);
  asymmetry ~ normal(0, 10);
  y ~ bernoulli_logit(y_hat);

My expectation was to observe similar values between the mean of the parameter preference and the position on the x axis of the centre of the predicted probability peak. However, preference has mean equal to 0.1 and the peak of the curve has position 0.3 (see plot). Why is there a gap between these two estimates?

Link to the data snippet

R code used for the computations

dat = list(N = nrow(CZ_data), y = CZ_data$mountYN, ratio = CZ_data$size_ratio)
stanfit = stan(file = "stanscript.stan", data = dat, iter = 6000, warmup = 2000, chains=4,
                     refresh=6000, control = list(max_treedepth = 15))

If you have an explicit asymmetry parameter, why do you think the mean parameter will be the mode? For skewed distributions, the mode!=mean.

1 Like

Thanks for the reply and it was indeed the asymmetry parameter that was making me feel confused. Problem solved.