Fitting generalized logistic function

I’m having trouble trying to fit a logistic function. I’ve reduced the issue down to simple test case described below. I’ve also posted the question on the stats StackExchange site with some exploration of the model fit.

What I’m seeing is that in order to fit the function, it is not enough to have priors and initialization centered on parameters that generated the data but also the priors need to be extremely tight.

Model

library(rstan)
library(bayesplot)
library(plyr)
library(dplyr)


model_code <- "
  functions {
    real sigmoid(real dose, real ed50, real slope){
      return inv_logit((dose-ed50)*slope);
    }
  }

  data {
    int<lower=0> N;
    real dose[N];
    real<lower=0,upper=1> response[N];
    real<lower=0> response_v[N];
    real ed50_prior_mu;
    real ed50_prior_sd;
    real slope_prior_mu;
    real slope_prior_sd;
  }

  parameters {
    real slope;
    real ed50;
  }

  transformed parameters {
    real response_mu[N];
    for (j in 1:N){
      response_mu[j] = inv_logit((dose[j]-ed50)*slope);
    }
  }

  model {
    slope ~ normal(slope_prior_mu, slope_prior_sd);
    ed50 ~ normal(ed50_prior_mu, ed50_prior_sd);
    for (j in 1:N){
      response ~ beta(response_mu[j]*response_v[j], (1-response_mu[j])*response_v[j]);
    }
  }
"

dose_response <- rstan::stan_model(
  model_code=model_code,
  model_name="dose_response")

dose_response %>% rstan::expose_stan_functions()

data

dose_min=-10
dose_max=10

# prior
ed50_mu <- 0
slope_mu <- 1

ed50_sd <- .01
slope_sd <- .01

# true parameters
ed50 <- ed50_mu
slope <- slope_mu

data <- data.frame(dose=rep(x=seq(dose_min, dose_max, length.out=10), times=10)) %>%
  dplyr::mutate(
    response = vapply(dose, function(d)
      sigmoid(
        dose=d,
        ed50=ed50,
        slope=slope),
      1.0))

Sample from prior

# sample prior parameters
prior <- data.frame(
 ed50 = rnorm(n=20, mean=ed50_mu, sd=ed50_sd),
  slope = rnorm(n=20, mean=slope_mu, sd=slope_sd)) %>%
  dplyr::mutate(draw_id=row_number()) %>%
  plyr::adply(1, function(df){
    data.frame(dose=seq(dose_min, dose_max, length.out=200)) %>%
      dplyr::mutate(
        response = vapply(dose, function(d) sigmoid(
          dose=d,
          ed50=df$ed50[1],
          slope=df$slope[1]), 1.0)) 

sample from posterior

samples <- rstan::sampling(
  object=dose_response,
  data=list(
    N=nrow(data),
    dose=data$dose,
    response=data$response,
    response_v=rep(11, times=nrow(data)),
    ed50_prior_mu = ed50_mu,
    ed50_prior_sd = ed50_sd,
    slope_prior_mu = slope_mu,
    slope_prior_sd = slope_sd),
  init=list(
    chain1=list(ed50=0, slope=1),
    chain2=list(ed50=0, slope=1),
    chain3=list(ed50=0, slope=1),
    chain4=list(ed50=0, slope=1)),
  control=list(
    adapt_delta=0.999,
    max_treedepth=12))

posterior <- samples %>%
  extract(pars=c("ed50", "slope")) %>%
  data.frame() %>%
  dplyr::sample_n(20) %>%
  dplyr::mutate(draw_id=row_number()) %>%
  plyr::adply(1, function(df){
    data.frame(dose=seq(dose_min, dose_max, length.out=200)) %>%
      dplyr::mutate(
        response = vapply(dose, function(d) sigmoid(
          dose=d,
          ed50=df$ed50[1],
          slope=df$slope[1]), 1.0))
  })

plot data, prior, and posterior

p <- ggplot() + theme_bw() +
  geom_line(data=data, mapping=aes(x=dose, y=response), color="black", size=2) +
  geom_point(data=data, mapping=aes(x=dose, y=response), color="black", size=3) +
  geom_line(data=prior, mapping=aes(x=dose, y=response, group=draw_id), color="red", size=.8, alpha=.3) +
  geom_line(data=posterior, mapping=aes(x=dose, y=response, group=draw_id), color="blue", size=.8, alpha=.3) +
  scale_y_continuous("Response Percent", limits=c(0, 1), labels=scales::percent) +
  scale_x_continuous("Dose")
p

Data(black), prior(red), posterior (blue):

I ran a pairs plot and it looks like all the response_mu parameters are highly correlated which is going to cause problems for the sampler. That makes my first thought that the model is over-parameterized, and can be expressed in a simpler way.

I looked up the a dose-response model in BDA3 (page 75) and it looks different than this model. I guess your responses are not integers but the probabilities between 0 and 1, which explains the beta distribution instead of the binomial. Is this how the data came or can you get number of deaths out of number of trials (integers)?

plot_zoom_png

Hi Arya, thanks for the response.

  • The motivation for the constraints for response to be in [0,1] and hence the beta distribution is that this model is intended to be the hit rate for a bernoulli observable. I haven’t shown this here because I was trying to isolate where the failure to fit the model. However to better align with BDA3, I suppose a truncated normal could work too. I have seen others model the sigma of the normal as sigma = 1/sqrt(tau) where tau ~ gamma(.0001, .0001).

  • It is unclear to me which variables have “mass” in the sampling procedure? On the safe side, it’s possible move the call to sigmoid to the model block and avoid defining the response_mu parameters.

  • If it is indeed slow mixing then, sampling for longer should help, right?

Taking these factors into account, here is an updated model:

data {
  int<lower=0> N;
  real dose[N];
  real<lower=0,upper=1> response[N];
  real ed50_prior_mu;
  real ed50_prior_sd;
  real slope_prior_mu;
  real slope_prior_sd;
}

parameters {
  real<lower=0> slope;
  real ed50;
  real<lower=0> tau;
}

transformed parameters {
  real sigma;
  sigma = 1/sqrt(tau);
}

model {
  slope ~ normal(slope_prior_mu, slope_prior_sd);
  ed50 ~ normal(ed50_prior_mu, ed50_prior_sd);
  tau ~ gamma(.0001, .0001);
  for (j in 1:N){
    response ~ normal(inv_logit((dose[j]-ed50)*slope), sigma);
  }
}

data

100 samples at 10 equally spaced points along the sigmoid curve:

dose_min <- -10
dose_max <- 10

ed50 <- 0
slope <- 1

sigmoid <- function(dose,  ed50, slope) {1/(1+exp(-(dose-ed50)*slope))}
data <- data.frame(dose=rep(x=seq(dose_min, dose_max, length.out=10), times=10)) %>%
  dplyr::mutate(response = sigmoid(dose, ed50, slope))

Sampling

Then sampling for 50000 iterations with the following prior

ed50_mu <- 0
slope_mu <- 1
ed50_sd <- .01
slope_sd <- .01

samples <- rstan::sampling(
  object=dose_response,
  data=list(
    N=nrow(data),
    dose=data$dose,
    response=data$response,
    ed50_prior_mu = ed50_mu,
    ed50_prior_sd = ed50_sd,
    slope_prior_mu = slop_mu,
    slope_prior_sd = slope_sd),
  init=list(
    chain1=list(ed50=0, slope=1),
    chain2=list(ed50=0, slope=1),
    chain3=list(ed50=0, slope=1),
    chain4=list(ed50=0, slope=1)),
 iter=50000,
 control=list(
    adapt_delta=0.999,
    max_treedepth=12))

It looks these modifications allow it to better fit the data! Here is the print:

> samples %>% print()
Inference for Stan model: dose_response.
4 chains, each with iter=50000; warmup=25000; thin=1; 
post-warmup draws per chain=25000, total post-warmup draws=1e+05.

         mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
slope    0.94    0.00 0.01    0.91    0.93    0.94    0.94    0.96 71623    1
ed50     0.00    0.00 0.01   -0.02   -0.01    0.00    0.01    0.02 73103    1
tau      2.46    0.00 0.04    2.39    2.43    2.46    2.48    2.52 71655    1
sigma    0.64    0.00 0.00    0.63    0.64    0.64    0.64    0.65 71587    1
lp__  -530.85    0.01 1.23 -534.05 -531.41 -530.54 -529.96 -529.46 38831    1

the pairs plot:

and a plot of the data (black), prior draws (red), and posterior draws (blue):

weaker prior

However, using the slightly weaker priors the fit again fails catastrophically

ed50_sd=1
slope_sd=.1

here is the print:

         mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
slope    0.00    0.00 0.00    0.00    0.00    0.00    0.00    0.01 55007    1
ed50     0.00    0.00 0.99   -1.94   -0.67    0.00    0.66    1.94 56124    1
tau      4.87    0.00 0.07    4.74    4.83    4.87    4.92    5.01 54070    1
sigma    0.45    0.00 0.00    0.45    0.45    0.45    0.46    0.46 53991    1
lp__  2860.56    0.01 1.34 2857.10 2859.95 2860.91 2861.55 2862.11 27414    1

Here is the pairs plot:

data (black), prior draws(red), posterior draws(blue):

There are really just 2 parameters, and they’re totally uncorrelated in the posterior. The rest are transformed parameters and don’t affect the sampling difficulty.

I think your problem might be here:

shouldn’t this probably be

response ~ beta(response_mu[j]*response_v[j], (1-response_mu[j]*response_v[j]));

which is different just in where a parenthesis is located, look carefully.

Hi Daniel,

My intent was to parameterize the Beta distribution in terms of the mean mu and the sample size v. Wikipedia says that this can be done by setting alpha=mu*v and beta=(1-mu)*v. Is it wrong?

Anyway, I’m happy to try the parameterization you suggest, but it doesn’t seem to solve the central issue of not recapitulating the slope:

model

  model_code <- "
  data {
    int<lower=0> N;
    real dose[N];
    real<lower=0,upper=1> response[N];
    real<lower=0,upper=1> response_v[N];
    real ed50_prior_mu;
    real ed50_prior_sd;
    real slope_prior_mu;
    real slope_prior_sd;
  }

  parameters {
    real<lower=0> slope;
    real ed50;
  }

  model {
    slope ~ normal(slope_prior_mu, slope_prior_sd);
    ed50 ~ normal(ed50_prior_mu, ed50_prior_sd);
    for (j in 1:N){
      real response_mu;
      response_mu = inv_logit((dose[j]-ed50)*slope);
      response ~ beta(response_mu*response_v[j], (1-response_mu*response_v[j]));
    }
  }
  "

  dose_response <- rstan::stan_model(
    model_code=model_code,
    model_name="dose_response")

data

dose_min=-10
dose_max=10

ed50_mu <- 0
slope_mu <- 1

ed50_sd <- .1
slope_sd <- .1

data <- data.frame(dose=seq(dose_min, dose_max, length.out=10)) %>%
  dplyr::mutate(
    response = vapply(dose, function(d)
      sigmoid(
        dose=d,
        ed50=ed50,
        slope=slope),
      1.0))

sample prior

prior <- data.frame(
  ed50 = rnorm(n=20, mean=ed50_mu, sd=ed50_sd),
  slope = rnorm(n=20, mean=slope_mu, sd=slope_sd)) %>%
  dplyr::mutate(draw_id=row_number()) %>%
  plyr::adply(1, function(df){
    data.frame(dose=seq(dose_min, dose_max, length.out=200)) %>%
      dplyr::mutate(
        response = vapply(dose, function(d) sigmoid(
          dose=d,
          ed50=df$ed50[1],
          slope=df$slope[1]), 1.0))

sample posterior

samples <- rstan::sampling(
  object=dose_response,
  data=list(
    N=nrow(data),
    dose=data$dose,
    response=data$response,
    response_v=rep(1, times=nrow(data)),    # note setting: response_v <- 1
    ed50_prior_mu = ed50_mu,
    ed50_prior_sd = ed50_sd,
    slope_prior_mu = slope_mu,
    slope_prior_sd = slope_sd),
  init=list(
    chain1=list(ed50=0, slope=1),
    chain2=list(ed50=0, slope=1),
    chain3=list(ed50=0, slope=1),
    chain4=list(ed50=0, slope=1)),
  iter=50000,
  control=list(
    adapt_delta=0.999,
    max_treedepth=12))

posterior <- samples %>%
  extract(pars=c("ed50", "slope")) %>%
  data.frame() %>%
  dplyr::sample_n(20) %>%
  dplyr::mutate(draw_id=row_number()) %>%
  plyr::adply(1, function(df){
    data.frame(dose=seq(dose_min, dose_max, length.out=200)) %>%
      dplyr::mutate(
        response = vapply(dose, function(d) sigmoid(
          dose=d,
          ed50=df$ed50[1],
          slope=df$slope[1]), 1.0))
  })

parameter fit:

      mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff Rhat
ed50  0.00       0 0.10 -0.19 -0.07 0.00 0.07  0.20 36266    1
slope 0.04       0 0.02  0.01  0.03 0.04 0.05  0.08 28922    1

ground truth (black); prior draws (red); posterior draws (blue)

Ah, thanks for clarifying. I think you’re right about your parameterization in terms of the mean/sample size so ignore my change in parenthesis, but I think also it explains what’s going on.

In your example fit with your parameterization, you used response_v = 11

plot the density plot in R for that when y is small say 0.05, and compare to when y ~ 0.5

curve(dbeta(x,.05*11,.95*11),from=0,to=1)
curve(dbeta(x,.5*11,.5*11),from=0,to=1)

You’ll see that in both cases there’s plenty of probability mass in the range of say 0.05 to 0.25

How can your model distinguish between the idea that maybe the curve is flat and the errors are all over the place compared to say the curve is sigmoid and the errors are small? It can’t unless you tell it something like “the errors are somewhat small” This is particularly true when you redid the fit and set response_v = 1 !!!

Make your response_v ~ 50 or 100, plot the density plots for beta to see what this means about your errors. then see if you get the fit you want.

Setting a tighter distribution for response is worth trying, however with

ed50_sd <- .1
ed50_sd <- .1

and my original response parameterization

response ~ beta(response_mu*response_v[j], (1-response_mu)*response_v[j]);

I am still not getting it to fit with response_v=100 or even 1000:

response_v = 1:

response_v = 10:

response_v = 100:

response_v=1000:

Ha, the bug is actually on that line, just not what you’d suggested–

instead of

response ~ beta(response_mu*response_v[j], (1-response_mu)*response_v[j]);

it should be

response[j] ~ beta(response_mu*response_v[j], (1-response_mu)*response_v[j]);

I feel like this should be an error, or at least give a warning. I’ll open an issue on github for it.

Thanks Daniel and Arya for the suggestions.

1 Like

Yes, that’s right. There’s an example in the manual. v has to be constrained to be positive. And it’s really sample size plus two, becuase mu = 0.5, v = 2 is the uniform beta(1, 1) case we usually think of corresponding to no samples.

This is the kind of case we’d like to catch and warn about. We have an ongoing plan to produce a pedantic mode that hasn’t materialized yet. The major obstacle is that we can only use heuristics to do any of this and working within the parser is a huge pain and working from the outside with things like regexes is super messy.