Trying to fit my costumised logistic function

What’s the number of trials per block? Is it constant across all blocks?

Yes, its the same number of trials.

Get BlueMail for Android

And how many trials are there per block?

I’m asking, because I want to reconstruct the number of correct trials per block per individual.

That would be performance * number of trials = number correct trials, right? And number correct trials should be an integer.

I ask, because I think it’d be better to model this as

\text{number correct trials}_t \sim \text{Binomial}(\text{number total trials}, \textit{performance}_t)

and then specify performance as the Logistic function (over time/blocks) as you have stated above.

Thank you so much for your help Max. I see what you mean.
There are actually only 8 trials per block, I only analyze one coherence level in that specific dataset.

So it is 14 blocks consisting of 8 trials each.

I just remembered that actually the reason why I used the performance per block is that I have some NA trials, meaning that sometimes there are less trials per block and thus the effective trialnumber per block is not the same. For example subject 1 has 4 correct in the first block and was too slow in one trial, so its only 4 out of 7 correct (thinking about it now, makes me wonder if that was a smart idea in the first place).
I uploaded the raw data of correct responses per block (again its 8 trials per block). I uploaded it for the first experiment, where we had 29 subjects.

Thank you again.

Carina

toydata.csv (1.02 KB)

1 Like

Hi, me and @stemangiola solved a similar problem, you may find it useful to check out our reparametrization of the sigmoid function: Difficulties with logistic population growth model
I didn’t check the math very carefully, but it seems to me that the only (but possibly important) difference is that we have a parameter for “value of the sigmoid at midpoint of the data” instead of “maximal value of the sigmoid”. (the paper we speak about in the linked post is still in the publication purgatory, I’ll ask @stemangiola whether we could make a preprint available)

Also note, that you can use triple backticks to format chunks of code/program output/… I’ve edited the first post for you, you might want to edit the rest of the thread :-)

Hope that helps!

2 Likes

I’ve done it for one of the posts in this thread.

1 Like

Okay. Let’s start simple and work our way up.

The one huge simplification we are doing is that we assume everyone had 8 trials. I don’t know if that is the correct assumption. You said something that some people were slow and could have their 8th trial… I’d say probably assuming 8 possible trials is fair enough (could be wrong) because being able to move through trials quickly might be a part of “performance”. (I could be completely wrong here.) In any case, you could actually correct the code for different trial numbers.

Anyway, this is how I load the data.

library(tidyverse)

toydata <- read_csv("data/toydata.csv")

names(toydata) <- c("subject_id", paste0("block_", names(toydata)[2:15]))
toydata <- toydata %>% mutate(subject_id = 1:n())
data_mat <- toydata %>% select(starts_with("block")) %>% as.matrix()

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

data_list <- list(
  N = nrow(data_mat),
  T = ncol(data_mat),
  y = data_mat
)

Starting simple, here is a model that estimates average performance over subjects and over time. It’s not the most efficient code, but the data set is pretty small…

mod0.stan:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T]; // this could be fed in as data
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0,upper=1> performance;
}

model{
  
  for(n in 1:N)
    y[n] ~ binomial(trials[n], performance); // vectorize over T
    
}

The result is kind of the grand mean… not particularly interesting…

> mod0 <- stan(file = "model/mod0.stan", data = data_list)
> mod0
Inference for Stan model: mod0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
performance     0.55    0.00 0.01     0.53     0.54     0.55     0.55     0.56  1691    1
lp__        -2239.53    0.02 0.70 -2241.47 -2239.70 -2239.28 -2239.10 -2239.05  1682    1

Now, let’s estimate performance for each subject separately.

mod1.stan:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T]; // could be data
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0,upper=1> performance[N];
}

model{
  
  for(n in 1:N)
    y[n] ~ binomial(trials[n], performance[n]); // again, vectorize over T
    
}

Let’s plot the results:


Tiles are subject (I numbered them differently to account for the one missing, see R code above), points are (sort of) the performance data points that you have in your data set (but assuming 8 trials for every block), and the blue lines/areas are the posterior probabilities of the time/block-constant, subject-varying performance means.

Now, let’s vary performance by all subject-block combinations.

mode2.stan:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0,upper=1> performance[N, T];
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);
  
}

This kind of replicates the data, but there is no real structure. Note, that if the number of trials would vary, you’s see slightly greater variance for those observations with less trials, as they “contain less information”.

Okay… Now, to your model. Actually, I had difficulties to fit the function that you have proposed… I set the minval to 0 and that worked out fine…

´´mode3.stan``:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  // all parameters vary by subject
  real<lower=0, upper=1> maxval[N];
  real<lower=0> steepness[N];
  real<lower=0> inflection[N];
}

transformed parameters{
  real<lower=0, upper=1> performance[N, T];
  
  // specefy performance as a logistic function
  for(n in 1:N)
    for(t in 1:T)
      performance[n, t] = maxval[n]*(1/(1+exp(-steepness[n]*(t-inflection[n]))));
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);
  
  // these are some rough priors... 
  // I got those from your suggestions and eyeballing...
  maxval ~ beta(9, 1);
  steepness ~ exponential(7); // this is on a much tighter scale than your guess in the first post
  inflection ~ gamma(7, 1);
  
}

This is the result as a plot:


From here you can further develop the model. Like incorporating the different trial numbers, reformulate the logistic function (minval?), setting more reasonable priors, or making the model hierarchical…

Hope this gives you a good starting point.

Cheers!
Max

PS: Tell me if you need the code for the plots.

EDIT:
I also quickly did the model with (estimated) min value:

data{
  int<lower=1> N;
  int<lower=1> T;
  int<lower=0, upper=8> y[N, T];
}

transformed data{
  int<lower=1, upper=8> trials[N ,T];
  
  for(t in 1:T)
    for(n in 1:N)
      trials[n, t] = 8;
  
}

parameters{
  real<lower=0, upper=1> minval[N];
  real<lower=0, upper=1> maxval[N];
  real<lower=0> steepness[N];
  real<lower=0> inflection[N];
}

transformed parameters{
  real<lower=0, upper=1> performance[N, T];
  
  for(n in 1:N)
    for(t in 1:T)
      performance[n, t] = (maxval[n] - minval[n])*(1/(1+exp(-steepness[n]*(t-inflection[n])))) + minval[n];
}

model{
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(trials[n, t], performance[n, t]);

  minval ~ beta(1, 9);
  maxval ~ beta(9, 1);
  steepness ~ exponential(7);
  inflection ~ gamma(7, 1);
  
}


PPS: Looks like subject_id 21 to 29 are just replicates of subject_id 2 to 10…

5 Likes

Thank you so much Max, it is way clearer to me now how to define my model using STAN syntax. I will explore a few more ideas, especially the hierarchical model might be interesting.
I tried to get your (very pretty) plots, but not sure how the access the stan file data. It would be great if you could share your code.

Thank you again.

Carina

Sure. Here’s the project folder. Unzip it and put it somewhere. I’d suggest double clicking on the psych-logistic.Rproj file and then opening script.R. Everything from loading data and running the models to plotting the results is in the script.R file.

Ah… I forgot that I cant upload .zip files here. I uploaded everything to github. Just follow this link. If you don’t use github you can just click Clone or download and then Download ZIP and then proceed as explained above.

Hello STAN community, I would like to compare different models using the loo package. I tried to generate data and get the log likelihood for my model. I tried something like this:

generated quantities {
//likelihood
 int log_lik [N,T];
    for(n in 1:N)
    for(t in 1:T)
      log_lik[n, t] = binomial_lpmf(y[n,t] | trials[n, t], performance[n, t]);
}

But I am not sure if I am doing the correct thing. I also wanted to generate some random data to see how good my model predicts data.

Any help is very much appreciated.
Thank you so much.
Carina

Hello, Carina.

You should declare log_lik as real, I think, as the output of the binomial lpmf is real-valued. But, the way you are doing it now, you are evaluating the log-likelihood of each observed trial conditional on the prediction of your model. If you feed that to loo, I think you’ll get leave-one-trial-out. I’m guessing you want to evaluate how well your model will predict the performance of a new participant across trials, not the result of a single new trial from one of the participants in the study?

What I think you should be doing (I trust someone will correct me if I’m giving you bad advice now) is to use log_sum_exp to sum the log-likelihoods of each participants trials, and then use that in loo. I would make a vector called log_lik, and then do the rest within the loop with a different name. Something like this:

generated quantities {
  real log_lik[N];
  int trials_pred[N,T];

    for(n in 1:N) {
      real trials_log_lik[T];
      for(t in 1:T) {
        trials_log_lik[t] = binomial_lpmf(y[n,t] | trials[n, t], performance[n, t]);
        trials_pred[n,t] = binomial_rng(trials[n,t], performance[n,t]);
      }
      log_lik[n] = sum(trials_log_lik);
    }
}

I think that would give you leave-one-participant-out (if that is what you want).

Edit: Added a bit of code to generate draws from the posterior, as per your final question. I think it could be vectorised, but perhaps it’s not a big deal as there is already a loop.

Second edit: Removed error pointed out by @martinmodrak, so as not to leave bad code to confuse others. Note, his implementation is more efficient.

3 Likes

Thank you. My loo output gives me the following values:

Computed from 8000 by 114 log-likelihood matrix

         Estimate  SE
elpd_loo    133.3 3.5
p_loo         0.9 0.1
looic      -266.7 7.0

I guess the looic behaves as the aic, and thus is in itself not informative? My ppc looks pretty weird:

I plotted 2000 draws and the fit for the first subject. My model is not really capturing what is going on.

Thanks for being helpful, the overall idea is sensible, but the execution is unfortunately incorrect - to have leave-one-participant-out cross validation, you sum the individual log-likelihoods directly. This actually could lead to a more compact code, as you can then have (ignoring generating predictions):

generated quantities {
  real log_lik[N];

    for(n in 1:N) {      
      log_lik[n] = binomial_lpmf(y[n,] | trials[n, ], performance[n, ]);
    }
}

Some more discussion at:

I think (not 100% sure) that elpd_loo can be a somewhat meaningful quantity (it is the “expected log predictive density”), but generally I find interpretation without another model to compare to challenging. PPC’s are in my experience much more useful to asses fit of a single model.

Hope that helps!

3 Likes

Thanks for pointing my error out, Martin! Guess I mixed it up with some calculations related to k-fold cv and calculating the elpd from posterior samples.

Some brief thoughts…

You should look into modeling drops in performance. Look at individuals 7 or 11 in my post above. It seems they show a drop in performance, but the function can not capture this.

Also, you can make the model more flexible. An easy option would be a Beta-Binomial model for example (don’t know if it reasonable, but worth a try), i.e. modeling performance as a Beta distribution. (The beta_proportion_lpdf could be conveniently used here, I think.) This will not solve the problem of performance drops directly, but will probably mitigate it through increased variance.

Another (as in “inclusive or”) way to make the model more flexible would be to put hierarchical priors on the function’s parameters. Again, this is no solution to the functional “misspecification”.

Hello Max,

Yep, I tried to play around with the prior distributions to implement the negative slopes. Using a beta distribution for the performance gives me the following error:

SAMPLING FOR MODEL ‘e54e114d0e716b66c0a80ed257b1eeb9’ NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: beta_lpdf: Random variable is 4, but must be less than or equal to 1 (in ‘model9f8794e58bd_e54e114d0e716b66c0a80ed257b1eeb9’ at line 35)

My performance value is between 0 and 1, so not sure why I get that error?

Make sure to use performance ~ beta_proportion(sigmoid_function, dispersion_parameter). The beta_proportion is a re-parameterization of the conventional Beta distribution, that takes a probability/proportion as first argument, and a non-negative dispersion as second argument. See the differences of Beta and Beta-Proportion here.

You could also use the Beta-Binomial parameterization, but you’d have to do the conversion from proportion and dispersion to the Beta’s alpha and beta parameters yourself.

1 Like

Thank you Max. It gives me a slightly better fit (especially for the subjects that have a negative slope).
I will try to implement Martin Modraks slope model and see if that fits my data. Thank you again for your help.

1 Like

Cool! Did you try a hierarchical version?