Problems with fitting inverse logit function to data

Hello,

I have data coming from subjects (example attached) to which I would like to fit per individual as well as on a group level (hierarchical) this logit function:

y = PSE-\frac{1}{1+e^{-steepness*(x-inflection)}}.

I was following those posts https://discourse.mc-stan.org/t/trying-to-fit-my-costumised-logistic-function/12228/14, this solution, and https://discourse.mc-stan.org/t/difficulties-with-logistic-population-growth-model/8286/2 and bellow is my STAN code but I am having problems with initialisation.

  1. Could you please help me with that? Is it a prior problem? Or too much noise in the data?
data{
  int<lower=1> N;
  int<lower=1> T; // num of GS
  real<lower=0, upper=1> y[N, T];
  real<lower=0, upper=2>  GS[N,T];
}

parameters{
  // all parameters vary by subject
  real<lower=0, upper=1> PSE[N]; //mid point
  real steepness[N];
  real inflection[N];
  real<lower=0> sigma;          // residual error of the observations, need to be positive
}

model{
  PSE ~ normal(0.5, 1);
  steepness ~ exponential(0.2); // gamma(7, 0.3) 
  inflection ~ gamma(7, 0.15);
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ normal(PSE[n]-(1/(1+exp(-steepness[n]*(GS[n,t]-inflection[n])))), sigma);
  
}
  1. Another question would be, can I somehow optimise my code? I was trying to make some use of inv_logit and binomial functions/distributions, the problem is that my values are not integers. What I’m trying to fit is a probability of an answer being incorrect (y) given the distance (GS). There is a given and fixed number of GSs values per individual and I know that y=0,5 (or should) at GS=0.

Thank you for any insights.

Data:

GS = np.array([0.  , 0.33, 0.6 , 0.8 , 1.  , 1.33])
y = np.array([[0.4 , 0.  , 0.2 , 0.6 , 0.2 , 0.2 ],
       [0.5 , 0.5 , 0.5 , 0.6 , 0.4 , 0.25],
       [0.6 , 0.  , 0.  , 0.2 , 0.  , 0.  ],
       [0.2 , 0.2 , 1.  , 0.6 , 0.2 , 0.2 ],
       [0.6 , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [1.  , 0.6 , 0.2 , 0.4 , 0.4 , 0.  ],
       [0.2 , 0.4 , 0.  , 0.2 , 0.2 , 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.4 ],
       [0.8 , 0.2 , 0.8 , 0.2 , 0.4 , 0.6 ],
       [0.2 , 0.2 , 0.  , 0.2 , 0.  , 0.  ],
       [0.4 , 0.2 , 0.2 , 0.2 , 0.  , 0.  ],
       [0.  , 0.  , 0.4 , 0.4 , 0.2 , 0.  ],
       [1.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.6 , 0.  , 0.4 , 0.  , 0.2 , 0.  ],
       [0.6 , 0.6 , 0.2 , 0.2 , 0.4 , 0.6 ],
       [0.8 , 0.2 , 0.2 , 0.2 , 0.  , 0.4 ],
       [0.8 , 0.4 , 0.2 , 0.  , 0.6 , 0.2 ],
       [0.4 , 0.2 , 0.2 , 0.  , 0.2 , 0.  ],
       [0.5 , 0.2 , 0.  , 0.  , 0.2 , 0.2 ],
       [0.5 , 0.  , 0.4 , 0.  , 0.  , 0.  ],
       [0.8 , 0.4 , 0.  , 0.  , 0.  , 0.  ],
       [0.2 , 0.2 , 0.2 , 0.2 , 0.2 , 0.  ],
       [0.4 , 0.6 , 0.6 , 0.2 , 0.6 , 0.  ],
       [0.2 , 0.6 , 0.4 , 0.4 , 0.  , 0.  ]])
1 Like

inv_logit is a function in Stan (https://mc-stan.org/docs/2_24/functions-reference/link-functions.html)

So you can simplify this:

y[n, t] ~ normal(PSE[n]-(1/(1+exp(-steepness[n]*(GS[n,t]-inflection[n])))), sigma);

to:

y[n, t] ~ normal(PSE[n]-inv_logit(steepness[n]*(GS[n,t]-inflection[n])), sigma);

Your model probably isn’t initializing because your parameters need to match the support of the priors you give them. So do steepness and inflection like real<lower=0.0> steepness[N] and real<lower=0.0> inflection[N].

When Stan initializes it puts all the random variables on the unconstrained scales (Stan internally works on an unconstrained space and transforms to a constrained space: https://mc-stan.org/docs/2_24/reference-manual/variable-transforms-chapter.html) uniformly in the range [-2, 2].

If any of steepness or inflection are below zero, the log density would blow up because of the exponential/gamma distributions. So constrain those and see if things move.

Where were using a binomial in the model? Is this normal a normal approximation to a binomial?

1 Like

Interesting, this seems to all be working, thank you very much.

  • The two following solutions solved the problem:

or with those priors:

  PSE ~ normal(0.5, 1);
  steepness ~ normal(2,1);
  inflection ~ normal(0.5,1);

Which of them is a better solution?

  • inv_logit I have tried it before but I was getting an error, now I see what was wrong (syntax).

Yes. I was thinking of

Binom code
data{
  int<lower=1> N;
  int<lower=1> T; // num of GS
  int<lower=0, upper=6> y[N, T];
  real<lower=0, upper=2>  GS[N,T];
}

parameters{
  // all parameters vary by subject
  real<lower=0, upper=1> PSE[N]; //mid point
  real<lower=0> steepness[N];
  real<lower=0> inflection[N];
}

transformed parameters{
  real<lower=0, upper=1> performance[N, T];
  
  // specefy performance as a logistic function
  for(n in 1:N)
    for(t in 1:T)
      performance[n, t] = PSE[n]+(1/(1+exp(-steepness[n]*(GS[n,t]-inflection[n]))));
}

model{
  PSE ~ normal(0.5, 1);
  steepness ~ exponential(0.2); // gamma(7, 0.3) 
  inflection ~ gamma(7, 0.15);
  
  for(n in 1:N)
    for(t in 1:T)
      y[n, t] ~ binomial(GS[n, t], performance[n, t]);
 
}

but getting

Error
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  int ~ binomial(real, real)

Available argument signatures for binomial:

  int ~ binomial(int, real)
  int ~ binomial(int, real[ ])
  int ~ binomial(int, vector)
  int ~ binomial(int, row_vector)
  int ~ binomial(int[ ], real)
  int ~ binomial(int[ ], real[ ])
  int ~ binomial(int[ ], vector)
  int ~ binomial(int[ ], row_vector)
  int[ ] ~ binomial(int, real)
  int[ ] ~ binomial(int, real[ ])
  int[ ] ~ binomial(int, vector)
  int[ ] ~ binomial(int, row_vector)
  int[ ] ~ binomial(int[ ], real)
  int[ ] ~ binomial(int[ ], real[ ])
  int[ ] ~ binomial(int[ ], vector)
  int[ ] ~ binomial(int[ ], row_vector)

Real return type required for probability function.
 error in 'unknown file name' at line 30, column 54
  -------------------------------------------------
    28:   for(n in 1:N)
    29:     for(t in 1:T)
    30:       y[n, t] ~ binomial(GS[n, t], performance[n, t]);
                                                             ^
    31:   
  -------------------------------------------------
  • So if I want to now take it one step further to the hierarchical model, how would I make it? I tried to add the hyper parameters and vectorise the code as is shown here:
Multilevel STAN

data{
int<lower=0> N; // number of all datapoints
int<lower=0> J; // number of subjects
real<lower=0, upper=1> y;
real<lower=0, upper=2> GS;
int subject[N];
}

parameters{
// all parameters vary by subject
vector[J] PSE;
vector[J] steepness;
vector[J] inflection;
real mu_pse;
real mu_steepness;
real mu_inflection;

real<lower=0> sigma; // residual error of the observations, need to be positive
real<lower=0> sigma_pse;
real<lower=0> sigma_steepness;
real<lower=0> sigma_inflection;
}

model{
mu_pse ~ normal(0.5, 1);
mu_steepness ~ exponential(0.2); // gamma(7, 0.3) this is on a much tighter scale than your guess in the first post
mu_inflection ~ gamma(7, 0.15);

PSE ~ normal(mu_pse, sigma_pse);
steepness ~ normal(mu_steepness, sigma_steepness);
inflection ~ normal(mu_inflection, sigma_inflection);

y ~ normal(PSE[subject]-inv_logit(steepness[subject]*(GS-inflection[subject])), sigma);
}

but I am missing something because it gives me an error for y that

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  vector * vector
.
.
Expression is ill formed.
 error in 'unknown file name' at line 35, column 79
  -------------------------------------------------
    33:   inflection ~ normal(mu_inflection, sigma_inflection);
    34:   
    35:   y ~ normal(PSE[subject]-inv_logit(steepness[subject]*(GS-inflection[subject])), sigma);
                                                                                      ^
    36: }

Thank you!

I dunno, if the parameters can actually be negative, then I’d say do the normals and let them be negative (stuff like this can help you figure out misspecified model stuff)

Re: the binomial, what does this number in the model stand for GS[n, t]? The first argument to the binomial distribution is the number of trials (the number of coinflips), so that’ll need to be an integer. I see that GS in the data are non-integer numbers, but maybe they were integers that have been divided by something?

Also y should be integers too, and I see that they are fractions. If these are rescaled things then maybe it makes sense to use a binomial on the original data, but the binomial is a distribution over integers and so y will need to be integers too.

No matches for: 

  vector * vector

Stan is picky about types. So * is matrix multiplication. So vector * vector doesn’t really make sense (row vector * vector might, or matrix * vector). To get elementwise multiplication do .*, like a .* b if a and b are vectors.

2 Likes

Thank you very much, for me a vector is also a matrix, totally missed that, works :).

It is not supposed to be an integer, it is a distance (actual physical distance invariant across subjects, defined to be the same for everyone). Also, the y is not an integer since it reflects probability. In general, it follows a binomial distribution for each single value of GS since each outcome can be only correct or incorrect. The y is then average across a few trials. You can think of it as flipping 6 different coins (6 GS values) and each has some binomial distribution underlying the results. The y values are then averages across at most 10 trials per one coin (GS). Is it clearer now?
So actually, when I am now thinking about it, I should have used the raw data and not the averages for y. However, the GS will still be real.

And I would have one more question. How exactly to incorporate log_lik part to be able to compare the models? I have the general idea of:

generated quantities {
  real log_lik[N];

    for(n in 1:N) {      
      log_lik[n] = normal_lpmf(y[n,] | ..., sigma);
    }
}

but I am unsure of what to put instead of the … and how this works on the multilevel (hierarchical) model.
Thanks!

If you have it, yeah definitely!

It sounds like, if these were hypothetical coin flips we’re talking about, y = number_of_heads / total_number_of_flips.

The binomial sorta statement will probably end up looking like:

number_of_heads ~ binomial_logit(total_number_of_flips, ...);

where ... is some model. I added the _logit in there cause basically in doing these models you always end up with some sort of link function, and binomial_logit does a logit link function for you (which is more stable than coding it manually).

Re: the log_lik question, I don’t know much about this stuff, but here’s some doc that looks relevant: https://avehtari.github.io/modelselection/rats_kcv.html

Thanks :).

About the LOO-IC/WAIC, I have tried this:

generated quantities {
  // For log likelihood calculation
  real log_lik[N];
  
  for (n in 1:J) {
    log_lik[n] += normal_lpdf(y[n] | PSE[n]-inv_logit(steepness[n]*(GS-inflection[n])), sigma);
  }

}

which gives me the following problem:

C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:839: RuntimeWarning: invalid value encountered in greater
  (tailinds,) = np.where(x > xcutoff)  # pylint: disable=unbalanced-tuple-unpacking
C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\arviz\stats\stats.py:683: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "

I will rather open a new issue and close this one because you have answered all of the initial problems and this seems to diverge.

Thanks again.

2 Likes