Trying to fit my costumised logistic function

Hi, me and @stemangiola solved a similar problem, you may find it useful to check out our reparametrization of the sigmoid function: Difficulties with logistic population growth model
I didn’t check the math very carefully, but it seems to me that the only (but possibly important) difference is that we have a parameter for “value of the sigmoid at midpoint of the data” instead of “maximal value of the sigmoid”. (the paper we speak about in the linked post is still in the publication purgatory, I’ll ask @stemangiola whether we could make a preprint available)

Also note, that you can use triple backticks to format chunks of code/program output/… I’ve edited the first post for you, you might want to edit the rest of the thread :-)

Hope that helps!

2 Likes

I’ve done it for one of the posts in this thread.

1 Like

Okay. Let’s start simple and work our way up.

The one huge simplification we are doing is that we assume everyone had 8 trials. I don’t know if that is the correct assumption. You said something that some people were slow and could have their 8th trial… I’d say probably assuming 8 possible trials is fair enough (could be wrong) because being able to move through trials quickly might be a part of “performance”. (I could be completely wrong here.) In any case, you could actually correct the code for different trial numbers.

Anyway, this is how I load the data.

library(tidyverse)

toydata <- read_csv("data/toydata.csv")

names(toydata) <- c("subject_id", paste0("block_", names(toydata)[2:15]))
toydata <- toydata %>% mutate(subject_id = 1:n())
data_mat <- toydata %>% select(starts_with("block")) %>% as.matrix()

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

data_list <- list(
  N = nrow(data_mat),
  T = ncol(data_mat),
  y = data_mat
)

Starting simple, here is a model that estimates average performance over subjects and over time. It’s not the most efficient code, but the data set is pretty small…

mod0.stan:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T]; // this could be fed in as data
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0,upper=1> performance;
}

model{
  
  for(n in 1:N)
    y[n] ~ binomial(trials[n], performance); // vectorize over T
    
}

The result is kind of the grand mean… not particularly interesting…

> mod0 <- stan(file = "model/mod0.stan", data = data_list)
> mod0
Inference for Stan model: mod0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
performance     0.55    0.00 0.01     0.53     0.54     0.55     0.55     0.56  1691    1
lp__        -2239.53    0.02 0.70 -2241.47 -2239.70 -2239.28 -2239.10 -2239.05  1682    1

Now, let’s estimate performance for each subject separately.

mod1.stan:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T]; // could be data
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0,upper=1> performance[N];
}

model{
  
  for(n in 1:N)
    y[n] ~ binomial(trials[n], performance[n]); // again, vectorize over T
    
}

Let’s plot the results:


Tiles are subject (I numbered them differently to account for the one missing, see R code above), points are (sort of) the performance data points that you have in your data set (but assuming 8 trials for every block), and the blue lines/areas are the posterior probabilities of the time/block-constant, subject-varying performance means.

Now, let’s vary performance by all subject-block combinations.

mode2.stan:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0,upper=1> performance[N, T];
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);
  
}

This kind of replicates the data, but there is no real structure. Note, that if the number of trials would vary, you’s see slightly greater variance for those observations with less trials, as they “contain less information”.

Okay… Now, to your model. Actually, I had difficulties to fit the function that you have proposed… I set the minval to 0 and that worked out fine…

´´mode3.stan``:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  // all parameters vary by subject
  real<lower=0, upper=1> maxval[N];
  real<lower=0> steepness[N];
  real<lower=0> inflection[N];
}

transformed parameters{
  real<lower=0, upper=1> performance[N, T];
  
  // specefy performance as a logistic function
  for(n in 1:N)
    for(t in 1:T)
      performance[n, t] = maxval[n]*(1/(1+exp(-steepness[n]*(t-inflection[n]))));
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);
  
  // these are some rough priors... 
  // I got those from your suggestions and eyeballing...
  maxval ~ beta(9, 1);
  steepness ~ exponential(7); // this is on a much tighter scale than your guess in the first post
  inflection ~ gamma(7, 1);
  
}

This is the result as a plot:


From here you can further develop the model. Like incorporating the different trial numbers, reformulate the logistic function (minval?), setting more reasonable priors, or making the model hierarchical…

Hope this gives you a good starting point.

Cheers!
Max

PS: Tell me if you need the code for the plots.

EDIT:
I also quickly did the model with (estimated) min value:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0, upper=1> minval[N];
  real<lower=0, upper=1> maxval[N];
  real<lower=0> steepness[N];
  real<lower=0> inflection[N];
}

transformed parameters{
  real<lower=0, upper=1> performance[N, T];
  
  for(n in 1:N)
    for(t in 1:T)
      performance[n, t] = (maxval[n] - minval[n])*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval[n];
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);

  minval ~ beta(1, 9);
  maxval ~ beta(9, 1);
  steepness ~ exponential(7);
  inflection ~ gamma(7, 1);
  
}


PPS: Looks like subject_id 21 to 29 are just replicates of subject_id 2 to 10…

5 Likes

Thank you so much Max, it is way clearer to me now how to define my model using STAN syntax. I will explore a few more ideas, especially the hierarchical model might be interesting.
I tried to get your (very pretty) plots, but not sure how the access the stan file data. It would be great if you could share your code.

Thank you again.

Carina

Sure. Here’s the project folder. Unzip it and put it somewhere. I’d suggest double clicking on the psych-logistic.Rproj file and then opening script.R. Everything from loading data and running the models to plotting the results is in the script.R file.

Ah… I forgot that I cant upload .zip files here. I uploaded everything to github. Just follow this link. If you don’t use github you can just click Clone or download and then Download ZIP and then proceed as explained above.

Hello STAN community, I would like to compare different models using the loo package. I tried to generate data and get the log likelihood for my model. I tried something like this:

generated quantities {
//likelihood
 int log_lik [N,T];
    for(n in 1:N)
    for(t in 1:T)
      log_lik[n, t] = binomial_lpmf(y[n,t] | trials[n, t], performance[n, t]);
}

But I am not sure if I am doing the correct thing. I also wanted to generate some random data to see how good my model predicts data.

Any help is very much appreciated.
Thank you so much.
Carina

Hello, Carina.

You should declare log_lik as real, I think, as the output of the binomial lpmf is real-valued. But, the way you are doing it now, you are evaluating the log-likelihood of each observed trial conditional on the prediction of your model. If you feed that to loo, I think you’ll get leave-one-trial-out. I’m guessing you want to evaluate how well your model will predict the performance of a new participant across trials, not the result of a single new trial from one of the participants in the study?

What I think you should be doing (I trust someone will correct me if I’m giving you bad advice now) is to use log_sum_exp to sum the log-likelihoods of each participants trials, and then use that in loo. I would make a vector called log_lik, and then do the rest within the loop with a different name. Something like this:

generated quantities {
  real log_lik[N];
  int trials_pred[N,T];

    for(n in 1:N) {
      real trials_log_lik[T];
      for(t in 1:T) {
        trials_log_lik[t] = binomial_lpmf(y[n,t] | trials[n, t], performance[n, t]);
        trials_pred[n,t] = binomial_rng(trials[n,t], performance[n,t]);
      }
      log_lik[n] = sum(trials_log_lik);
    }
}

I think that would give you leave-one-participant-out (if that is what you want).

Edit: Added a bit of code to generate draws from the posterior, as per your final question. I think it could be vectorised, but perhaps it’s not a big deal as there is already a loop.

Second edit: Removed error pointed out by @martinmodrak, so as not to leave bad code to confuse others. Note, his implementation is more efficient.

3 Likes

Thank you. My loo output gives me the following values:

Computed from 8000 by 114 log-likelihood matrix

         Estimate  SE
elpd_loo    133.3 3.5
p_loo         0.9 0.1
looic      -266.7 7.0

I guess the looic behaves as the aic, and thus is in itself not informative? My ppc looks pretty weird:

I plotted 2000 draws and the fit for the first subject. My model is not really capturing what is going on.

Thanks for being helpful, the overall idea is sensible, but the execution is unfortunately incorrect - to have leave-one-participant-out cross validation, you sum the individual log-likelihoods directly. This actually could lead to a more compact code, as you can then have (ignoring generating predictions):

generated quantities {
  real log_lik[N];

    for(n in 1:N) {      
      log_lik[n] = binomial_lpmf(y[n,] | trials[n, ], performance[n, ]);
    }
}

Some more discussion at:

I think (not 100% sure) that elpd_loo can be a somewhat meaningful quantity (it is the “expected log predictive density”), but generally I find interpretation without another model to compare to challenging. PPC’s are in my experience much more useful to asses fit of a single model.

Hope that helps!

3 Likes

Thanks for pointing my error out, Martin! Guess I mixed it up with some calculations related to k-fold cv and calculating the elpd from posterior samples.

Some brief thoughts…

You should look into modeling drops in performance. Look at individuals 7 or 11 in my post above. It seems they show a drop in performance, but the function can not capture this.

Also, you can make the model more flexible. An easy option would be a Beta-Binomial model for example (don’t know if it reasonable, but worth a try), i.e. modeling performance as a Beta distribution. (The beta_proportion_lpdf could be conveniently used here, I think.) This will not solve the problem of performance drops directly, but will probably mitigate it through increased variance.

Another (as in “inclusive or”) way to make the model more flexible would be to put hierarchical priors on the function’s parameters. Again, this is no solution to the functional “misspecification”.

Hello Max,

Yep, I tried to play around with the prior distributions to implement the negative slopes. Using a beta distribution for the performance gives me the following error:

SAMPLING FOR MODEL ‘e54e114d0e716b66c0a80ed257b1eeb9’ NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: beta_lpdf: Random variable is 4, but must be less than or equal to 1 (in ‘model9f8794e58bd_e54e114d0e716b66c0a80ed257b1eeb9’ at line 35)

My performance value is between 0 and 1, so not sure why I get that error?

Make sure to use performance ~ beta_proportion(sigmoid_function, dispersion_parameter). The beta_proportion is a re-parameterization of the conventional Beta distribution, that takes a probability/proportion as first argument, and a non-negative dispersion as second argument. See the differences of Beta and Beta-Proportion here.

You could also use the Beta-Binomial parameterization, but you’d have to do the conversion from proportion and dispersion to the Beta’s alpha and beta parameters yourself.

1 Like

Thank you Max. It gives me a slightly better fit (especially for the subjects that have a negative slope).
I will try to implement Martin Modraks slope model and see if that fits my data. Thank you again for your help.

1 Like

Cool! Did you try a hierarchical version?

Nope, currently reading about it ;) I would like to try it, especially as my subs behave very differently.

Get BlueMail for Android

Hello STAN community, I tried to implement a hierachichal version but I am not sure if this is the right way to do it:

binomial_model.stan:

data{
  int<lower=1 N;
  int<lower=1 T;
  int<lower=0, upper=8 y[N, T];
}

transformed data{
  int<lower=1, upper=8 trials[N ,T];
  
  for(t in 1:T)
  for(n in 1:N)
  trials[n, t] = 8;
  
}

parameters{
  real<lower=0, upper=1 minval[N];
  real<lower=0, upper=1 maxval[N];
  real<lower=0 steepness[N];
  real<lower=0 inflection[N];
  real a;
  real b;
  real c;
}

transformed parameters{
  real<lower=0, upper=1 performance[N, T];
  
  for(n in 1:N)
  for(t in 1:T)
  performance[n, t] = (maxval[n] - minval[n])*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval[n];
}

model{
  
  for(n in 1:N)
  for(t in 1:T)
  y[n, t] ~ binomial(trials[n, t], performance[n, t]);
  
  minval ~ beta(1, 9);
  maxval ~ beta(9, 1);
  steepness ~ exponential(a);
  inflection ~ gamma(b, c);
  a ~ normal(7, 100);
  b ~ normal(7, 100);
  c ~ normal(1,100);
}

I wanted to estimate the parameters (here I tried steepness and infliction) for each individual but taking into account possible correlations between subjects.
The chaings won’t converge with this code, (even with 8000 iterations). Is this because the data is so diverse and there are no correlations between subjects or my code is wrong? Thank you so much.

Hey Carina!

I tried something that did not work super well, unfortunately. The idea was to do something similar to the model discussed in this thread – it is also a non-linear model, with added hierarchical structure for the parameters. This approach models steepness and inflection on the log scale (to ensure that they are positive) and then apply a multivariate normal prior for partial pooling.

This is my attempt:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0, upper=1> minval[N];
  real<lower=0, upper=1> maxval[N];
  vector[2] mus; 
  vector<lower=0>[2] sigmas;
  cholesky_factor_corr[2] L_Omega;
  vector[2] z[N];
}

transformed parameters{

  real<lower=0> steepness[N];
  real<lower=0> inflection[N];
  real<lower=0, upper=1> performance[N, T];
  
  for(n in 1:N){
    vector[2] params = mus + diag_pre_multiply(sigmas, L_Omega) * z[n];
    steepness[n] = exp(params[1]);
    inflection[n] = exp(params[2]);
    for(t in 1:T)
      performance[n, t] = (maxval[n] - minval[n])*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval[n];
  }
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);
      
  maxval ~ beta(9, 1);
  maxval ~ beta(1, 9);
      
  mus[1] ~ normal(  1, 1);
  mus[2] ~ normal(2.5, 1);
  
  sigmas ~ exponential(10);
  
  for (n in 1:N)
    z[n] ~ std_normal();
    
  L_Omega ~ lkj_corr_cholesky(4);
  
}

With a high number of iterations and adapt_delta = 0.99 this results in a few divergent transitions on the data set that you provided earlier. I think better priors could solve the problem… Maybe someone else has some thoughts as well…

1 Like

I have no experience with these kind of models but what strikes me are maxval and minval. I assume that
maxval >= minval. Also there is no information sharing between subjects for maxval and minval. The model currently assumes independent values for all subjects. One option would be to start with 1 maxval and 1 minval for all subjects and then build up to see where the problems/divergences emerge.

parameters{
...
real <lower=0, upper=1> minval;
real <lower=minval, upper=1> maxval;
...
}
model{
...
performance[n, t] = (maxval - minval)*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval;
...
}
2 Likes

Could also be worth using the built-in inv_logit function, since you can run into problems with over-/under-flow with the hand-coded math:

(maxval - minval)*inv_logit(steepness[n]*(t-inflection[n])) + minval;

3 Likes