Incoherence between mean parameter value and position of the centre predicted probability peak

sam0per · September 17, 2018, 3:44pm

Halo Stan Users,

I have built a generalised linear model using a custom equation with one explanatory variable (ratio) and five parameters (level, scale, preference, choosiness and asymmetry). The dependent variable is binary and it consists of a series of zeros and ones. The model converged without warnings and it was cross-validated using the loo package. Therefore, the fitting process should be fairly accurate. What keeps me awake at night and frustrated during the day is the estimate of one single parameter, the preference parameter.
The equation I am using in the Stan model stems from the Gaussian function. It has the sum of level and scale as the parameter for the height of the curve’s peak, preference is the position of the centre of the peak, choosiness controls the width of the curve and asymmetry allows for a difference between the right and left tail. This is the implementation of the model in Stan.

data {
  int<lower=0> N;    //number of observations
  vector[N] ratio;    //explanatory variable
  int<lower=0, upper=1> y[N];    //binary dependent variable
}

parameters {
  real level;
  real<lower=0, upper=20> scale;
  real preference;
  real<lower=0, upper=20> choosiness;
  real asymmetry;
}

transformed parameters{
  vector[N] y_hat;
  for (i in 1:N)
    y_hat[i] = level + scale * exp(-((ratio[i] - preference)^2)/(2 * (choosiness^2))) + asymmetry * ratio[i];
}

model {
  level ~ normal(0, 10);
  preference ~ normal(0, 10);
  asymmetry ~ normal(0, 10);
  
  y ~ bernoulli_logit(y_hat);
}

My expectation was to observe similar values between the mean of the parameter preference and the position on the x axis of the centre of the predicted probability peak. However, preference has mean equal to 0.1 and the peak of the curve has position 0.3 (see plot). Why is there a gap between these two estimates?

Link to the data snippet

R code used for the computations

dat = list(N = nrow(CZ_data), y = CZ_data$mountYN, ratio = CZ_data$size_ratio)
stanfit = stan(file = "stanscript.stan", data = dat, iter = 6000, warmup = 2000, chains=4,
                     refresh=6000, control = list(max_treedepth = 15))

mike-lawrence · September 19, 2018, 1:07pm

If you have an explicit asymmetry parameter, why do you think the mean parameter will be the mode? For skewed distributions, the mode!=mean.

sam0per · September 20, 2018, 7:19am

Thanks for the reply and it was indeed the asymmetry parameter that was making me feel confused. Problem solved.

Topic		Replies	Views
Centred vs. non-centred parametrisation with lognormal likelihood Modeling fitting-issues	15	2186	April 4, 2020
Lognormal Hierarchical Distribution Fitting Modeling specification , hierarchical-model	10	2675	February 11, 2025
Correlation between bias and true parameter values Modeling	6	1598	July 9, 2020
Dose-response model with partial pooling Modeling fitting-issues , pharmacology , hierarchical-model	23	3067	April 14, 2020
Prior recommendation for scale parameters in hierarchical models too strong? Modeling	25	8936	January 31, 2018

Incoherence between mean parameter value and position of the centre predicted probability peak

Related topics