Thanks for pointing my error out, Martin! Guess I mixed it up with some calculations related to k-fold cv and calculating the elpd from posterior samples.
Some brief thoughtsâŚ
You should look into modeling drops in performance. Look at individuals 7 or 11 in my post above. It seems they show a drop in performance, but the function can not capture this.
Also, you can make the model more flexible. An easy option would be a Beta-Binomial model for example (donât know if it reasonable, but worth a try), i.e. modeling performance
as a Beta distribution. (The beta_proportion_lpdf
could be conveniently used here, I think.) This will not solve the problem of performance drops directly, but will probably mitigate it through increased variance.
Another (as in âinclusive orâ) way to make the model more flexible would be to put hierarchical priors on the functionâs parameters. Again, this is no solution to the functional âmisspecificationâ.
Hello Max,
Yep, I tried to play around with the prior distributions to implement the negative slopes. Using a beta distribution for the performance gives me the following error:
SAMPLING FOR MODEL âe54e114d0e716b66c0a80ed257b1eeb9â NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: beta_lpdf: Random variable is 4, but must be less than or equal to 1 (in âmodel9f8794e58bd_e54e114d0e716b66c0a80ed257b1eeb9â at line 35)
My performance value is between 0 and 1, so not sure why I get that error?
Make sure to use performance ~ beta_proportion(sigmoid_function, dispersion_parameter)
. The beta_proportion
is a re-parameterization of the conventional Beta distribution, that takes a probability/proportion as first argument, and a non-negative dispersion as second argument. See the differences of Beta and Beta-Proportion here.
You could also use the Beta-Binomial parameterization, but youâd have to do the conversion from proportion and dispersion to the Betaâs alpha and beta parameters yourself.
Thank you Max. It gives me a slightly better fit (especially for the subjects that have a negative slope).
I will try to implement Martin Modraks slope model and see if that fits my data. Thank you again for your help.
Cool! Did you try a hierarchical version?
Nope, currently reading about it ;) I would like to try it, especially as my subs behave very differently.
Hello STAN community, I tried to implement a hierachichal version but I am not sure if this is the right way to do it:
binomial_model.stan
:
data{
int<lower=1 N;
int<lower=1 T;
int<lower=0, upper=8 y[N, T];
}
transformed data{
int<lower=1, upper=8 trials[N ,T];
for(t in 1:T)
for(n in 1:N)
trials[n, t] = 8;
}
parameters{
real<lower=0, upper=1 minval[N];
real<lower=0, upper=1 maxval[N];
real<lower=0 steepness[N];
real<lower=0 inflection[N];
real a;
real b;
real c;
}
transformed parameters{
real<lower=0, upper=1 performance[N, T];
for(n in 1:N)
for(t in 1:T)
performance[n, t] = (maxval[n] - minval[n])*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval[n];
}
model{
for(n in 1:N)
for(t in 1:T)
y[n, t] ~ binomial(trials[n, t], performance[n, t]);
minval ~ beta(1, 9);
maxval ~ beta(9, 1);
steepness ~ exponential(a);
inflection ~ gamma(b, c);
a ~ normal(7, 100);
b ~ normal(7, 100);
c ~ normal(1,100);
}
I wanted to estimate the parameters (here I tried steepness and infliction) for each individual but taking into account possible correlations between subjects.
The chaings wonât converge with this code, (even with 8000 iterations). Is this because the data is so diverse and there are no correlations between subjects or my code is wrong? Thank you so much.
Hey Carina!
I tried something that did not work super well, unfortunately. The idea was to do something similar to the model discussed in this thread â it is also a non-linear model, with added hierarchical structure for the parameters. This approach models steepness
and inflection
on the log scale (to ensure that they are positive) and then apply a multivariate normal prior for partial pooling.
This is my attempt:
data{
int<lower=1> N;
int<lower=1> T;
int<lower=0, upper=8> y[N, T];
}
transformed data{
int<lower=1, upper=8> trials[N ,T];
for(t in 1:T)
for(n in 1:N)
trials[n, t] = 8;
}
parameters{
real<lower=0, upper=1> minval[N];
real<lower=0, upper=1> maxval[N];
vector[2] mus;
vector<lower=0>[2] sigmas;
cholesky_factor_corr[2] L_Omega;
vector[2] z[N];
}
transformed parameters{
real<lower=0> steepness[N];
real<lower=0> inflection[N];
real<lower=0, upper=1> performance[N, T];
for(n in 1:N){
vector[2] params = mus + diag_pre_multiply(sigmas, L_Omega) * z[n];
steepness[n] = exp(params[1]);
inflection[n] = exp(params[2]);
for(t in 1:T)
performance[n, t] = (maxval[n] - minval[n])*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval[n];
}
}
model{
for(n in 1:N)
for(t in 1:T)
y[n, t] ~ binomial(trials[n, t], performance[n, t]);
maxval ~ beta(9, 1);
maxval ~ beta(1, 9);
mus[1] ~ normal( 1, 1);
mus[2] ~ normal(2.5, 1);
sigmas ~ exponential(10);
for (n in 1:N)
z[n] ~ std_normal();
L_Omega ~ lkj_corr_cholesky(4);
}
With a high number of iterations and adapt_delta = 0.99
this results in a few divergent transitions on the data set that you provided earlier. I think better priors could solve the problem⌠Maybe someone else has some thoughts as wellâŚ
I have no experience with these kind of models but what strikes me are maxval and minval. I assume that
maxval >= minval. Also there is no information sharing between subjects for maxval and minval. The model currently assumes independent values for all subjects. One option would be to start with 1 maxval and 1 minval for all subjects and then build up to see where the problems/divergences emerge.
parameters{
...
real <lower=0, upper=1> minval;
real <lower=minval, upper=1> maxval;
...
}
model{
...
performance[n, t] = (maxval - minval)*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval;
...
}
Could also be worth using the built-in inv_logit
function, since you can run into problems with over-/under-flow with the hand-coded math:
(maxval - minval)*inv_logit(steepness[n]*(t-inflection[n])) + minval;