Dose Response Model with partial pooling on maximum value

I am trying to get a dose response model ( four parameter log-logistic model) working with the aim of modeling the temperature that winegrapes get too cold and die in the winter, but my model is not behaving well.

The x data I have are simulated daily winter air temperatures (range of about 10 to -10 degrees C). I have added 30 to each temperature so values don’t drop below 30 but remain the same relative to each other. The y data are simulated winegrape hardiness (range about -25 to 0 degrees C) multiplied by -1 so that all values are positive and maximum hardiness (i.e. -25 degrees C) is the biggest value (25). I tried to simulate data to have the exact structure as the Stan model.

I have a model that works fine until I try adding a hierarchical element to the d parameter (maximum hardiness). Now I get ok parameter estimates in general, but lots of warning messages and an odd relationship between the hierarchical element (d_var_sigma) and the log probability density lp__.

This is the model:
\mu_{i}=f(x_{i},(b,c,d,e))=c+\frac{d_{var,i}-c}{1+e^{b(log(x_{i})-\tilde{e})}}
\tilde{y}_{i}\sim normal(\mu_{i},\sigma)

Where:
x is the concentration of the dose (amount of winter cold)
b is the response rate (slope)
d is the upper asymptote of the response (maximum hardiness)
b is the lower asymptote of the response (minimum hardiness)
e is the effective dose ED50 (winter temperature where cold hardiness is half way between min and max)
\tilde{e} is the log of the effective dose ED50

Priors (hardiness has been multipled with -1 to be positive, and 30 has been added to air temp)
b \sim gamma(7,0.75)
d \sim Normal(25, 10)
\sigma_{d,var} \sim gamma(2.5, 1.75)
d_{var} \sim Normal(d, \sigma_{d,var})
c \sim gamma(3,1)
\tilde{e} \sim normal(log(30), 0.1)
\sigma \sim normal(0,5)


data {
  int<lower=1> N;                           // Number of observations
  vector<lower=1>[N] x;                      // Dose values (air temperature)
  vector [N] y;  
  
  //Level 2 	
	int < lower = 1 > n_vars; 				// number of random effect levels (varieties) 
	int < lower = 1, upper = n_vars > variety[N]; // id of random effect (variety)// Response values (Winter cold hardiness). was an int in original code 
}

// Sampling space
parameters {
  real b;                             // slope
  real<lower=0> c;                     // lower asymptote
  real<lower=0> d;                     // upper asymptote
  real<lower=0> ehat;                           //  x where y is half way between c and d, but logged in equation
  real<lower=0> sigma_g;                        //error around the estimated hardiness 

  //level 2
  real <lower = 0> d_var_sigma; 		// a standard deviation of how much d (maximum hardiness) varies 
  real var_d[n_vars]; // a list of the effect of each variety on d (maximum hardiness)
  
}

// Calculate posterior
model {
  vector[N] mu_y;

  // Priors on parameters
  c ~ gamma(3, 1);                    // 
  d ~ normal(25, 10);               // centred around maximum hardiness, 10 sd
  ehat ~ normal(log(30), 0.15);   // centred around mean temperature, sd 20
  b ~ gamma(7, 0.75);                 // 
  sigma_g ~ normal(0, 5);           // 
  
  //level 2
  d_var_sigma ~gamma(2.5, 1.75);
  var_d ~ normal(0, d_var_sigma); //prior for the effect of variety on slope 
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = c + ((d + var_d[variety[i]] - c) / (1 + exp(b * (log(x[i]) - ehat))));
  }
  y ~ normal(mu_y, sigma_g); // 
}


generated quantities {
  
  // Simulate model configuration from prior model (get mu_y)
  vector[N] mu_y;                      // Simulated mean data from likelyhood 
  vector[N] y_sim;                           //Simulated Data
  real<lower=0> e;
  e = exp(ehat);
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = c + ((d+ var_d[variety[i]] - c) / (1 + exp(b * (log(x[i]) - ehat))));
  }
  // Simulate data from observational model
  for (n in 1:N) y_sim[n] = normal_rng(mu_y[n], sigma_g); 
}

This model gives the following warnings, but does a good job of estimating parameters:

Warning messages:
1: There were 35 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
2: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
3: Examine the pairs() plot to diagnose sampling problems
 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I think something is going on with the relationship between d_var_sigma and lp__, but I don’t really understand enough to figure it out.

drc_dh_pairs

So far I have tried:
d_var_sigma prior: a half normal prior (0,5) and a gamma prior that is set up in the current version of the model.
Running the model for 6000 warmup runs and a further 4000 iterations
Max treedepth 15

I feel a bit lost with this problem, can anyone suggest a sensible way forward?

2 Likes

IIRC @martinmodrak has grappled with models like this one for a while.

2 Likes

Yes, I’ve worked with sigmoids in models a bit. They can be challenging to fit.

But first think you might want to check out is “non-centered parametrization” of the hierarchical component, i.e. have parameters var_d_raw ~ normal(0,1); and then derive transformed parameter var_d = var_d_raw * d_var_sigma. (see e.g. https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html for more details).

I’ve found the direct parametrization of the sigmoid problematic, primarily because if your data don’t cover the whole “dynamic range” of the sigmoid (i.e. data close to both lower and upper plateau and to the middle), some of the parameters can become basically unconstrained by data. My other guess here is that when you have a single d the data cover the whole dynamic range, while when you let d (and hence the whole sigmoid) vary by group (variety), this ceases to be the case for some groups. I would expect that you would see in the posterior is that as d_var_sigma gets larger, all of the sigmoid parameters are allowed to vary wildly.

I worked with @stemangiola on fitting sigmoids in a different context, you might want to check out the parametrization we used in a preprint: https://doi.org/10.1101/2020.03.16.993162, also discussed at Difficulties with logistic population growth model

I also wrote about some of the considerations at Hierarchical Gompertz model

Best of luck with your model!

3 Likes

Hi @martinmodrak, thanks for your suggestions, they are very helpful. I was wondering, though, if you had any issues with collinearity between eta and y0 when you fit the alternative parameterisation?

I had a go at fitting the model, although I am still quite new to model building so I might have made a mistake. Here is the equation:

\mu=f(x,(y_{0},\beta,\eta))=\frac{y_{0}(1 +e^{\eta\beta})}{1+e^{-\beta(\eta - x)}}
\tilde{y}_{i}\sim normal(\mu_{i},\sigma)

Priors:
Hardiness has been multipled with -1 to be positive, and air temp has been standardized (mean is 0)
beta \sim Normal(10,5)
eta \sim Normal(0, 0.35)
y_{0} \sim halfNormal(15,5)
\sigma \sim normal(0,5)

Stan Code:


data {
  int<lower=1> N;                           // Number of observations
  vector[N] x;                      // Dose values (air temperature)
  vector[N] y;                     //winter hardiness, *-1 to eb positive
}

// Sampling space
parameters {
  real beta;                             // slope
  real eta;                     // infelction point of x. 
  real<lower=0> y0;                     // y intercept (where x is 0)
  real<lower=0> sigma_g;                        //error around the estimated hardiness 

}

// Calculate posterior
model {
  vector[N] mu_y;

  // Priors on parameterssimLTEPos <- simLTE * -1
  beta ~ normal(10, 5);                    // centred around no hardiness, could be as high as -20
  eta ~ normal(0, 0.35);               // centred around 0 (x mid point), but can range around all x values, but strongly constrained to sit around the mean 
  sigma_g ~ normal(0, 5);           // 
  y0 ~ normal(15, 5);             // constraining y0 to fall inside normal hardiness values 
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = (y0*(1 + exp(beta*eta))/ (1 + exp(beta*(eta-x[i]))));
  }
  y ~ normal(mu_y, sigma_g); // I dont know why, but this was a poisson distribution in the original model 

}
generated quantities {
  
  // Simulate model configuration from prior model (get mu_y)
  vector[N] mu_y;                      // Simulated mean data from likelyhood 
  vector[N] y_sim;                           //Simulated Data

  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = (y0*(1 + exp(beta*eta))/ (1 + exp(beta*(eta-x[i]))));
  }
  // Simulate data from observational model
  for (n in 1:N) y_sim[n] = normal_rng(mu_y[n], sigma_g); 
}
 

The model doesn’t give any warnings and predicts the values well for my simulated data so I may be worrying about nothing, but I think that the pairs plot suggests strong collinearity between eta and y0. Did you see this problem also, or is it likely to be a miss-specification on my part?

2 Likes

This definitely looks worrying and could cause trouble if you add the hierarchical part or other complexities to the model. I didn’t see this before - in my hubris I though I understood the sigmoid, but you proved me wrong :-). There also doesn’t appear to be a bug in your model… I can confirm this correlation is present quite frequently for simulated data and I should have noticed this earlier in my own models. Time for me to learn some more about sigmoids, I guess, thanks for being curious about this :-)

I tried looking into this for a while and have no final word, but it seems to me that the parametrization I proposed shows this correlation mostly in the cases where I would expect the natural parametrization to work (i.e. when the data span the whole range of the simgoid). Will try to look into this a bit more in the following days…

If you use non-centered parametrization with your original hierarchical model, do you still have divergences?

One thing I do find weird is your prior on beta - it assumes very steep sigmoids, here is my prior predictive check using this prior:

library(tidyverse)

N <- 50
x_obs <- seq(-5, 5, length.out =  100)
sim_data <- tibble(draw = 1:N,
       beta = rnorm(N, 10, 5), 
       #beta = rnorm(N, 0, 1),
       eta = rnorm(N, 0, 0.35),
       y0 = rnorm(N, 15, 5),
       sigma_g = abs(rnorm(N, 0, 5))) %>%
  crossing(tibble(x = x_obs)) %>%
  mutate(mu = (y0*(1 + exp(beta*eta))/ (1 + exp(beta*(eta-x)))), y = rnorm(n(), mu, sigma_g)) 

sim_data %>%
  filter(mu < 100) %>% #Filter out extreme values to see results clearly
  ggplot(aes(x = x, y = mu, group = draw)) + geom_line(alpha = 0.2) 

The correlation you mention appears frequently even if I use beta ~ normal(0, 1), so this is not the cause of the problem, but I would check whether you really believe that such steep changes are to be expected a-priori.

Also it appears like you are willing to a-prior assume that beta > 0 - is that correct? That would likely simplify things a bit…

2 Likes

So I think I know whats going on here @Faith @martinmodrak. I dont’ think the relationship between eta and y0 is colinear, but rather non-identifiable (if those concepts are actually fully distinct 🤔). When you specified your reparamterisation above, y_0 is presumably some specific value of y - for example y at time 0 or y at time max - yes? So in such notation, it is assumed that y_0 is available from the data. But in your model, y_0 is not determined by the data and is only constrained by its prior - thus it is free to vary across quite a range regardless of the values of the other parameters or the data. However you can very slightly ammend your code to fairly tightly constrain y_0 to a data determined value.

This is how I did it for the Gompertz models in the end:

transformed parameters{
  vector[N] mu;
  real y_at_tmax = Y[N];

  mu = y_at_tmax * exp( exp(-k*(time[N] - T_i)  ) - exp(-k*(time - T_i) ) );
}

So here y_at_tmax is still a parameter - but its more or less pinned to the value of Y[N] in the data (for some reason it does not work if you put Y[N] directly in the equation for mu 🤷‍♂️). By the way in this code I know ahead of time the Y[N] is the value of y at t_max - but this depends on data structure, so if the last Y in your data isn’t at t_max you would need to do something differently. This worked very well when I used y_at_tmax, but very badly for y_at_tmin for reasons I dont’ fully understand except possibly that y_at_tmin was either 0 or a small value.

Anyhow - this is what I think - I’m open to correction!

2 Likes

Thanks @martinmodrak for the comment on the beta prior - That was a bit looser than I thought. I have now tightened it up a bit to a gamma(7,1) in the models now.

Regarding the y0 parameter, thanks for your suggestion @jroon. I had also been wondering how I could have a parameter being a y value. An aside: I am also not clear on the difference between collinearity and non-identifiability - I thought they were the same thing? Anyway, your suggestion makes sense but I can’t see how to use it in this particular model because we do not know exactly where the middle of the sigmoid is. I guess maybe I could set it to y when x is 0 as the x value has been standardized, but my real data does not have data for when winegrapes have lost most of their winter hardiness (I don’t have data around minimum y values).
Perhaps I will take a look at fitting a Gompertz model instead? There seems to be so many variants on a sigmoid curve!

@martinmodrak, I tried non-centered parameterization for the variation around d. Here is the new equation and stan model below. The model works better now, and has less of an obvious problem in the pairs plot. I still think there are some fitting problems between b, c and ehat though? When I tried it with my real data though I get problems again with divergent transitions and collinearity. I am going to play around with the priors some more, but any suggestions or comments would be welcome!

\mu=f(x_{i},(b,c,d,e))=c+\frac{d_{var,i}-c}{1+e^{b(log(x_{i})-\tilde{e})}}
{d}_{var} = d + dr_{var} * \sigma_{dvar}
\tilde{y}_{i}\sim normal(\mu_{i},\sigma)

Where:\
x is the concentration of the dose (amount of winter cold)
b is the response rate (slope)\
d_{var} is the upper asymptote of the response (maximum hardiness) for each variety
\sigma_{dvar} is the standard deviation of the effect of varieties on maximum winter hardiness\
dr_{var} is the non centred parameterization values for varieties
b is the lower asymptote of the response (minimum hardiness)
e is the effective dose ED50 (winter temperature where cold hardiness is half way between min and max)
\tilde{e} is the log of the effective dose ED50

Priors (hardiness has been multipled with -1 to be positive, and 30 has been added to air temp)
b \sim gamma(7,1)
d \sim Normal(25, 10)
\sigma_{dvar} \sim normal(0,5)
dr_{var} \sim normal(0,1)
c \sim gamma(3,1)
\tilde{e} \sim normal(log(30), 0.1)
\sigma \sim normal(0,5){}


data {
  int<lower=1> N;                           // Number of observations
  vector<lower=1>[N] x;                      // Dose values (air temperature)
  vector [N] y;  
  
  //Level 2 	
	int < lower = 1 > n_vars; 				// number of random effect levels (varieties) 
	int < lower = 1, upper = n_vars > variety[N]; // id of random effect (variety)// Response values (Winter cold hardiness). was an int in original code 
}

// Sampling space
parameters {
  real b;                             // slope
  real<lower=0> c;                     // lower asymptote
  real<lower=0> d;                     // upper asymptote
  real<lower=0> ehat;                           //  x where y is half way between c and d, but logged in equation
  real<lower=0> sigma_g;                        //error around the estimated hardiness 

  //level 2
  real <lower = 0> d_var_sigma; 		// a standard deviation of how much d (maximum hardiness) varies 
  real d_var_raw[n_vars]; // non centred parameterisation bit 
}

transformed parameters{
    real var_d[n_vars]; // a list of the effect of each variety on d (maximum hardiness)
    
    for (n in 1:n_vars){
      var_d[n] = d_var_sigma * d_var_raw[n];
    }
}

// Calculate posterior
model {
  vector[N] mu_y;

  // Priors on parameters
  c ~ gamma(3, 1);                    // Most likely around 3, could very towards 0.  
  d ~ normal(25, 10);               // centred around maximum hardiness, 10 sd
  ehat ~ normal(log(30), 0.15);   // centred around mean temperature, sd 20
  b ~ gamma(7, 1);                 // 
  sigma_g ~ normal(0, 5);           // the dark

  //level 2
  d_var_sigma ~gamma(2.5, 1.75);
  d_var_raw ~ normal(0, 1);
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = c + ((d + var_d[variety[i]] - c) / (1 + exp(b * (log(x[i]) - ehat))));
  }
  y ~ normal(mu_y, sigma_g); // I dont know why, but this was a poisson distribution in the original model 

}


generated quantities {
  
  // Simulate model configuration from prior model (get mu_y)
  vector[N] mu_y;                      // Simulated mean data from likelyhood 
  vector[N] y_sim;                           //Simulated Data
  real<lower=0> e;
  e = exp(ehat);
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = c + ((d+ var_d[variety[i]] - c) / (1 + exp(b * (log(x[i]) - ehat))));
  }
  // Simulate data from observational model
  for (n in 1:N) y_sim[n] = normal_rng(mu_y[n], sigma_g); 
}

Here is the pairs plot from the running the simulated data through the model.

And here is the pairs plot from running the real data through the model:

A minor clarification I forgot to mention - the non-centered parametrization only makes sense when you do partial pooling across multiple curves, it is likely to cause problems when you only have one curve… My intention was that you try the model from the very first post with the non-centered parametrization. Maybe your data actually work even with the default implementation and the main problem was the hierarchical part… I would definitely try that first, before playing with the experiment below.


Anyway, you inspired to give sigmoid another deeper look and I managed to push further one older idea, which seems to work better than the previous one. The main idea is simple. Let’s assume majority of our x values spans roughly between -1 and 1 (but we have a few outside this interval). Having defined the sigmoid function as:

f(x, \beta, \alpha, k) = \frac{k}{1 + e^{-\alpha - \beta x}}

We can introduce three new parameters a,b,c, such that:

a = f(-1,\beta, \alpha, k) \\ b = f(0,\beta, \alpha, k) \\ c = f(1,\beta, \alpha, k) \\

Now a,b,c should be well informed by data - they are the observed values within the range our data covers. Also they should somewhat independent, provided that there are enogh data points between each of our anchor points (the method fails with too few data).

There is a slight issue - not all combinations of a,b,c values define a sigmoid. Obviously, sigmoids are monotonic, so we either need to have 0 < a < b < c or 0 < c < b < a, but is that enough? And how do we solve for \alpha, \beta, k from a,b,c? Now I am not good enough at math to work this out easily by myself. But I’ve enlisted the help of the free version of the Wolfram Cloud (can heavily recommend for doing symbolic math) - my code is here: https://www.wolframcloud.com/obj/martin.modrak/Published/sigmoid_clean.nb

EDIT: I copied the constraint with a mistake, fixed.

Our friend Wolfram tells us that we have an additional constrain b^2 > ac and that the solution is (regardless whether we have a raising or falling sigmoid):

\alpha = \log{\left(\frac{a}{(a-b)} + \frac{b}{(c-b)}\right)}\\ \beta = \log\frac{a-b}{b-c} + \log{c} - \log{a}\\ k = \frac{ b (a (b - 2 c) + b c}{ b^2 - a c}

So let us assume a lognormal prior on a,b,c to ensure positivity and generate same fake data as a prior predictive check. Note that we can take arbitrary a,c > 0 and then only b is constrained to match the raising/falling shape of the sigmoid and b^2 > ac.

We generate i.i.d lognormal, order them, take a 50% chance of swapping the order. To satisfy the b^2 > ac constraint we do what is fancifully called “rejection sampling” and basically means we discard all draws that do not satisfy the constraint (those tend to be rare):

set.seed(4949393)
N <- 60
x_obs <- seq(-1.5, 1.5, length.out =  100)
sim_data <- tibble(
       # Generate iid log normal
       y1 = rlnorm(N, log(15), log(5)),
       y2 = rlnorm(N, log(15), log(5)),
       y3 = rlnorm(N, log(15), log(5)),
       sigma_g = abs(rnorm(N, 0, 0.5))) %>%
  mutate(a = pmin(y1,y2,y3), c = pmax(y1,y2,y3),  #Order
         b = if_else(a == y1,
                     if_else(c == y3, y2, y3),
                     if_else(a == y2,
                             if_else(c == y3, y1, y3),
                             if_else(c == y2, y1, y2))),
         swap = rbinom(N, 1, 0.5) == 1, #Potentially swap
         a_ = if_else(swap, c, a),
         c = if_else(swap, a, c),
         a = a_
         ) %>%
  select(-swap, a_) %>%
  filter(b^2 > a * c) %>% # Rejection sampling
  mutate(
    draw = 1:n(), 
     alpha = log(a/(a-b) + b / (c-b)), # Compute target parameters
     beta = log((a-b)/(b-c)) + log(c) - log(a),
     k = b * (a * (b - 2 *c) + b * c) / (b^2 - a * c),

  ) %>%    
       
  crossing(tibble(x = x_obs)) %>%
  mutate(
    mu = k/(1+exp(-beta*x - alpha)),
    y = rlnorm(n(), log(mu), sigma_g)) 


sim_data %>%
  filter(abs(mu) < 200) %>%
  ggplot(aes(x = x, y = mu, group = draw)) + geom_line(alpha = 0.5) 

Looks like a wide range of sigmoids is a-priori plausible, so we’re happy for now.

So, now our Stan model. Since we expect positive-only values, I took the liberty to moving from normal noise to lognormal noise as this seems more natural (but you understand your data better, so if this doesn’t make sense to you, keep normal noise). We directly enforce the constraints on b via lower and upper. (note: a,b, c > 0 and b^2 > ac imply b > min\{a,c\}, so we can leave this one out). We need to do a max call, which is potentially problematic - the posterior is still continous, but is not smooth. In this case it seems it is not a big issue as with enough data b is sampled far from the bounds and the unsmoothness is not visited.

EDIT: The previous version had an extra constraint that was unnecessary and caused divergences.

data {
  int<lower=1> N;                           // Number of observations
  vector[N] x;                      // Dose values (air temperature)
  vector[N] y;                     //winter hardiness, *-1 to eb positive
}

// Sampling space
parameters {
  real<lower=0> a;
  real<lower=0> c;
  // Enforce the constraints (yes, this is completely valid code)
  real<lower=sqrt(a * c), upper=max({a,c})> b;
  real<lower=0> sigma_g;  

}

transformed parameters {
  real alpha = log(a/(a-b) + b / (c-b));
  real beta = log((a-b)/(b-c)) + log(c) - log(a);
  real<lower=0> k = b * (a * (b - 2 *c) + b * c) / (b^2 - a * c);
}

// Calculate posterior
model {
  vector[N] logmu_y;

  // priors
  sigma_g ~ normal(0, 0.5);           
  a ~ lognormal(log(15), log(5));            
  b ~ lognormal(log(15), log(5));           
  c ~ lognormal(log(15), log(5)); 

  //likelihood function 
  for (i in 1:N) {
    //Staying on log scale, the line below is equialent to 
    // mu_y[i] = k/(1+exp(-beta*x[i] - alpha));
    logmu_y[i] = log(k) - log1p_exp(-beta*x[i] - alpha);
  }
  y ~ lognormal(logmu_y, sigma_g); 
}

generated quantities {
  
  // Reversing Stan's constraints for diagnostics
  real b_raw;
  
  // Simulate model configuration from prior model (get mu_y)
  vector[N] logmu_y;                      // Simulated mean data from likelyhood 
  vector[N] y_sim;                           //Simulated Data

  //likelihood function 
  for (i in 1:N) {
    logmu_y[i] = log(k) - log1p_exp(-beta*x[i] - alpha);
  }
  // Simulate data from observational model
  for (n in 1:N) y_sim[n] = lognormal_rng(logmu_y[n], sigma_g); 
  
  {
    // Recompute the bounds and reverse the constraint transform
    real b_low = max({min({a,c}), sqrt(a * c)});
    real b_up = max({a,c});
    b_raw = logit((b - b_low) / b_up);
  }
}

And let’s fit it to some of the data we simualted:

set.seed(5949445)
draw_to_use <- 1

N_x <- 15
data_filtered <- sim_data %>% filter(draw == draw_to_use) %>% sample_n(N_x)
data_for_plot_true <- data_filtered %>%
  pivot_longer(c("mu","y"), names_to = "name", values_to = "value")

data_for_stan <- list(N = N_x, x = data_filtered$x, y = data_filtered$y)
fit <- sampling(model_mod, data = data_for_stan)

pars_of_interest <- c("a", "b_raw","c", "sigma_g", "lp__")

summary(fit, pars = pars_of_interest)$summary
summary(fit, pars = c("alpha","beta","k"))$summary

bayesplot::mcmc_pairs(fit, pars = pars_of_interest, transformations = list(a = "log", c = "log"), 
                      np = bayesplot::nuts_params(fit))

gather_draws(fit, logmu_y[i], n = 50) %>% inner_join(tibble(i = 1:N_x, x = data_for_stan$x), by = "i") %>%  ggplot(aes(x = x, y = exp(.value), group = .draw)) + geom_line(alpha = 0.2) + geom_line(aes(x = x, y = value, color = name), data = data_for_plot_true, inherit.aes = FALSE, size = 2)

Recovers neatly:

The pairs plot is not perfect, but the strong correlations are gone:

Also note that while we have a pretty good idea what a,b,c are, we can’t really constrain k:

              mean     se_mean        sd        2.5%         25%         50%         75%       97.5%
a         4.2811747 0.020926543  1.144839   2.5241839   3.4879422   4.1238347   4.8887739   6.9103971
b_raw    -1.3489155 0.021847439  1.000944  -3.5588657  -1.9071577  -1.2759029  -0.6997127   0.3779332
c       214.7881633 1.652218995 73.590397 105.9263605 162.3593716 203.4160025 254.7649242 387.6245267
sigma_g   0.6633307 0.002493904  0.125718   0.4671061   0.5708108   0.6487263   0.7355805   0.9519432
lp__     -6.0126353 0.040435991  1.571293  -9.8941125  -6.7714416  -5.6472307  -4.8603798  -4.0675430
           n_eff      Rhat
a       2992.914 1.0007751
b_raw   2099.028 0.9997686
c       1983.842 0.9992487
sigma_g 2541.179 1.0004543
lp__    1510.004 1.0011529

             mean      se_mean           sd       2.5%        25%         50%        75%      97.5%    n_eff
alpha  -0.6950607   0.02186324 9.810327e-01  -2.708883  -1.257854  -0.6879693  -0.109265   1.175497 2013.437
beta    3.3406841   0.01211739 6.491872e-01   2.190443   2.897072   3.2918770   3.722365   4.768387 2870.263
k     485.9463804 189.54584919 1.116032e+04 108.837253 172.298576 224.9516964 302.439180 829.057592 3466.768
           Rhat
alpha 0.9995636
beta  1.0000370
k     1.0001177

I still get some divergences when the data are consistent with almost no change (I guess this is the max biting me will think if I can somehow get rid of it), so it is not perfect, but I think it is better than the previous parametrization…

Well, enough exploration for today… We’ll see if that actually helps you, but I learned a trick or two on the way :-)

4 Likes

So all I know about identifiability is from Martin’s previous blog: https://www.martinmodrak.cz/2018/05/14/identifying-non-identifiability/ I think although it can occur when there is colinearity, it can occur in other circumstances too - i.e. whenever a parameter in a model is not really affected by the data (I think). But @martinmodrak can correct me if I’m wrong there I hope! I dont’ think I would go with the gompertz as its less flexible than your current models. As to which y value to use - I went for the end of the curve where y was largest - which for me was at the max x time. If your curve is decreasing this is probably same for you - if your curve is decreasing than y at min x time probably better (I think).

But it seems @martinmodrak has pulled some amazing stuff out of his that there for you so thats bound to be better than anything I’ve said 😅

1 Like

Interesting @martinmodrak, does this model include inflection and slope as parameters? I guess, they can always calculated after the fit.

So do you think this parameterisation is superior to the one we conceived?

1 Like

Yes.

[quote=“stemangiola, post:10, topic:17086”]
So do you think this parameterisation is superior to the one we conceived?
[/quote]l

I didn’t have time to do a thorough benchmark, but I currently think this one is better if you know the sign of the slope. If you don’t, and the data can’t constrain it, this one runs into severe issue. So for me, the search is still on…

I had a look into this new parameterization for my question, and it seems to fit the data well. I do know the sign of the slope though, as we know that winegrapes get more hardy as air temperatures drop.

Here is the Stan code I wrote based on @martinmodrak’s parametrization of the dose response curve:

data {
  int<lower=1> N;                           // Number of observations
  vector[N] x;                      // Dose values (air temperature)
  vector[N] y;                     //winter hardiness, *-1 to eb positive
}

// Sampling space
parameters {
  real<lower=0> a;
  real<lower=0> c;
  // Enforce the constraints (yes, this is completely valid code)
  real<lower=sqrt(a * c), upper=max({a,c})> b;
  real<lower=0> sigma_g;  

}

transformed parameters {
  real alpha = log(a/(a-b) + b / (c-b));
  real beta = log((a-b)/(b-c)) + log(c) - log(a);
  real<lower=0> k = b * (a * (b - 2 *c) + b * c) / (b^2 - a * c);
}

// Calculate posterior
model {
  vector[N] logmu_y;

  // priors
  sigma_g ~ normal(0, 0.5);           
  a ~ lognormal(log(15), log(5));            
  b ~ lognormal(log(15), log(5));           
  c ~ lognormal(log(15), log(5)); 

  //likelihood function 
  for (i in 1:N) {
    //Staying on log scale, the line below is equialent to 
    // mu_y[i] = k/(1+exp(-beta*x[i] - alpha));
    logmu_y[i] = log(k) - log1p_exp(-beta*x[i] - alpha);
  }
  y ~ lognormal(logmu_y, sigma_g); 
}

generated quantities {
  
  // Reversing Stan's constraints for diagnostics
  real b_raw;
  
  // Simulate model configuration from prior model (get mu_y)
  vector[N] logmu_y;                      // Simulated mean data from likelyhood 
  vector[N] y_sim;                           //Simulated Data

  //likelihood function 
  for (i in 1:N) {
    logmu_y[i] = log(k) - log1p_exp(-beta*x[i] - alpha);
  }
  // Simulate data from observational model
  for (n in 1:N) y_sim[n] = lognormal_rng(logmu_y[n], sigma_g); 
  
  {
    // Recompute the bounds and reverse the constraint transform
    real b_low = max({min({a,c}), sqrt(a * c)});
    real b_up = max({a,c});
    b_raw = logit((b - b_low) / b_up);
  }
}

I simulated this data to fit to the model:

and here is the pairs plot of how it fitted the simulated data.

The model also did a good job I think of fitting to the real data, I saw no warnings and the pairs plot looks good:

That the model fir my real data is nice because I am lacking data around minimum winegrape hardiness - the winegrape growers didn’t collect hardiness data outside of winter where the plants are minimally hardy. I think this is why the previous models were struggling so much.

Although I liked this parameterization, in the end my solution was to turn to the plant physiology literature to see what winegrapes green tissue’s minimum cold hardiness is. I used this information to set a narrow prior on the c (minimum hardiness) parameter. I liked this solution because the parameters in the original dose response model are biologically meaningful and allowed me to make sensible choices about which aspects of the curve I wanted hierarchical variation on. This is the model I went with:

\mu=f(x_{i},(b,c,d,e))=c+\frac{(d+d_{var,i} + d_{site,i}) -c}{1+exp^{b_{var}(log(x_{i})-\tilde{e})}}
{d}_{var} = dr_{var} * \sigma_{dvar}
{d}_{site} = dr_{site} * \sigma_{dsite}
{b}_{var} = br_{var} * \sigma_{bvar}
\tilde{y}_{i}\sim normal(\mu_{i},\sigma)

Where:
x is the concentration of the dose (amount of winter cold)
b is the response rate (slope)
d is the grand upper asymptote of the response (maximum hardiness hardiness)
d_{var} is the effect of each variety on the upper asymptote of the response (maximum hardiness)
d_{site} is the effect of each site on the upper asymptote of the response (maximum hardiness)
\sigma_{dvar} is the standard deviation of the effect of varieties on maximum winter hardiness
\sigma_{dsite} is the standard deviation of the effect of sites on maximum winter hardiness
dr_{var} is the non centred parameterization values for varieties effect on maximum hardiness d dr_{site} is the non centred parameterization values for sites effect on maximum hardiness d
b is the lower asymptote of the response (minimum hardiness)
\sigma_{bvar} is the standard deviation of the effect of varieties on rate of change of winter hardiness
br_{var} is the non centred parameterization values for varieties effect on rate of change
e is the effective dose ED50 (winter temperature where cold hardiness is half way between min and max)
\tilde{e} is the log of the effective dose ED50

Priors:
(hardiness has been multiplied with -1 to be positive, and 30 has been added to air temp)
b \sim gamma(7,1)
\sigma_{bvar} \sim normal(0,3)
br_{var} \sim normal(0,1)
d \sim Normal(25, 10)
\sigma_{dvar} \sim gamma(2.5,1.75)
dr_{var} \sim normal(0,1)
\sigma_{dsite} \sim gamma(2.5,1.75)
dr_{site} \sim normal(0,1)
c \sim normal(2,0.5)
\tilde{e} \sim normal(log(30), 0.15)
\sigma \sim normal(0,5)


data {
  int<lower=1> N;                           // Number of observations
  vector<lower=1>[N] x;                      // Dose values (air temperature)
  vector [N] y;  
  
  //Level 2 	
	int < lower = 1 > n_vars; 				// number of random effect levels (varieties) 
	int < lower = 1, upper = n_vars > variety[N]; // id of random effect (variety)// Response values (Winter cold hardiness). was an int in original code 

	int < lower = 1 > n_sites; 				// number of random effect levels (sites) 
	int < lower = 1, upper = n_sites > sites[N]; // id of random effect (site)// Response values (Winter cold hardiness). was an int in original code 

  // Data for predicting
    int<lower = 0> N_p; // number of new x values 
    vector<lower=1>[N_p] newX; // a new set of mean air temp data for use in generated quantities 

}

// Sampling space
parameters {
  real<lower=0>b;                             // slope
  real<lower=0> c;                     // lower asymptote
  real<lower=0> d;                     // upper asymptote
  real<lower=0> ehat;                           //  x where y is half way between c and d, but logged in equation
  real<lower=0> sigma_g;                        //error around the estimated hardiness 

  //level 2 - variety
  real <lower = 0> d_var_sigma; 		// a standard deviation of how much d (maximum hardiness) varies 
  real d_var_raw[n_vars]; // non centred parameterisation bit 
  
  real <lower = 0> b_var_sigma; // standard deviation of how much slope b changes by variety
  real b_var_raw[n_vars]; // non centred bit 
  
  //level 2 - site
  real <lower = 0> d_site_sigma; 		// a standard deviation of how much d (maximum hardiness) varies 
  real d_site_raw[n_sites]; // non centred parameterisation bit 
}

transformed parameters{
    real var_d[n_vars]; // a list of the effect of each variety on d (maximum hardiness)
    real var_b[n_vars]; // a list of the effect of each variety on d (maximum hardiness)
     
    real site_d[n_sites]; // a list of the effect of each site on d (maximum hardiness)
    
    for (n in 1:n_vars){
      var_d[n] = d_var_sigma * d_var_raw[n];
      var_b[n] = b_var_sigma * b_var_raw[n];
    }
    
    for (n in 1:n_sites){
      site_d[n] = d_site_sigma * d_site_raw[n];
    }
}

// Calculate posterior
model {
  vector[N] mu_y;

  // Priors on parameters
  c ~ normal(2, 0.5);                    // centred around 3, coudl very towards 0.  
  d ~ normal(25, 10);               // centred around maximum hardiness, 10 se 
  ehat ~ normal(log(30), 0.15);   // centred around mean temperature, se 20
  b ~ gamma(7, 1);                 // centred around 1, not really sure of appropriate prior tbh  
  sigma_g ~ normal(0, 5);           // I have little concept of how much variation there shoudl be, thsi is a stab in the dark

  //level 2 - variety 
  d_var_sigma ~gamma(2.5, 1.75);
  d_var_raw ~ normal(0, 1);
  
  b_var_sigma ~ normal(0, 3);
  b_var_raw ~ normal(0, 1);
  
  //level 2 - site
  d_site_sigma ~gamma(2.5, 1.75);
  d_site_raw ~ normal(0, 1);
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = c + ((d + var_d[variety[i]] + site_d[sites[i]] - c) / (1 + exp((b +  var_b[variety[i]]) * (log(x[i]) - ehat))));
  }
  y ~ normal(mu_y, sigma_g); // I dont know why, but this was a poisson distribution in the original model 

}


generated quantities {
  
  // Simulate model configuration from prior model (get mu_y)
  vector[N] mu_y;                      // Simulated mean data from likelyhood 
  vector[N] y_sim;                           //Simulated Data

  //Prediting data
  vector[N_p] mu_y_p;                      // Simulated mean data from likelyhood 
  vector[N_p] y_sim_p;                           //Simulated Data
  
  real<lower=0> e;
  e = exp(ehat);
  
  
  //liklyhood function 
  for (i in 1:N) {
    mu_y[i] = c + ((d+ var_d[variety[i]] + site_d[sites[i]] - c) / (1 + exp((b +  var_b[variety[i]]) * (log(x[i]) - ehat))));
  }
  // Simulate data from observational model
  for (n in 1:N) y_sim[n] = normal_rng(mu_y[n], sigma_g); 
  
  //Predict new y data from new x data
   for (i in 1:N_p) {
    mu_y_p[i] = c + ((d+ var_d[variety[2]] - c) / (1 + exp((b +  var_b[variety[2]]) * (log(newX[i]) - ehat))));
  }
    for (n in 1:N_p) y_sim_p[n] = normal_rng(mu_y_p[n], sigma_g); 
}

Once the c parameter was constrained, my pairs plots look pretty good. Not perfect, but I am happy with them for now especially as my retroactive and predictive checks of the model look good.

Edit: I forgot to add the pairs plot for the final model:

Thank you all for your help , and I am happy to keep discussing dose response curves if you have more comments or suggestions.

3 Likes

Nice work! Glad you have been able to make your model work, and big thanks for sharing what worked for you. I also agree that good domain expertise is often way better than pure theorizing :-)

It would be useful to see your data in order to run the model and check the diagnostics first hand. However, here is an alternative formulation, albeit using a sigmoidal emax rather than log-logistic, but I believe they are similar in what they offer in terms of their parametric form and I have used the sigmoid emax previously so am more more familiar with this form. I haven’t used a random effect on the parameters for these models previously, but have introduced one here on \beta_2 using a non-centered parameterisation (you might want to check the manual on this, or Non centered parameterization on variance parameter).

data {
  int<lower=0> N;
  int<lower=1> Nc;
  real y[N];
  vector[N] x;
  int c[N];
  vector[4] mu0;
  vector[5] sig0;
}
parameters {
  // upper and lower asymptote
  real b1;
  real<lower=0> b3; // the ed50 
  real b4; // governs the shape of the curve
  real<lower=0> xi;

  vector[Nc] b2_z;
  real<lower=0> nu;
}
transformed parameters{
  real mu[N];
  vector[Nc] b2;

  b2 = b2_z * nu;

  for(i in 1:N){
    mu[i] = b1 + b2[c[i]] * pow(x[i], b4) * inv(pow(b3, b4) + pow(x[i], b4));  
  }
}
model {
  target += normal_lpdf(b1 | mu0[1], sig0[1]);

  target += normal_lpdf(b2_z | 0, 1);
  target += normal_lpdf(nu | 0, 5);

  target += normal_lpdf(b3 | mu0[3], sig0[3]) - normal_lccdf(0 | mu0[3], sig0[3]);
  target += normal_lpdf(b4 | mu0[4], sig0[4]);
  target += normal_lpdf(xi | 0, sig0[5]);
  target += normal_lpdf(y | mu, xi);
}
generated quantities {
  
}

Based on your description, I simulated some data as follows:

sigemax <- function(x, a, b, c, d){
  a + b * (x^d) / (c^d + x^d)
}

# data simulation
N <- 50
Nc <- 5
n <- N / Nc
set.seed(6)
x <- seq(20, 40, len = n)
x <- rep(x, len = N)
a <- 0
b <- rnorm(Nc, 0, 2)
c <- 30
d <- 20
k <- rep(1:Nc, each = n)
y <- sigemax(x, a, 20 + b[k], c, d)
y <- y + rnorm(N, 0, 1)
plot(x, y, xlim = c(20, 40), col = k)

and then compiled and ran the sampler as usual

mod1 <- rstan::stan_model("emax1.stan", verbose = F)
ld <- list(N = N,
           Nc = Nc,
           c = k,
           y = y,
           x = x,
           mu0 = c(0, 25, 30, 20),
           sig0 = c(2, 4, 3, 10, 4)
)
library(rstan)
library(scales)
# mc (many cores please)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
fit1 <- rstan::sampling(object  = mod1,
                        data    = ld,
                        chains  = 8,
                        thin    = 1,
                        iter    = 5000,
                        refresh = 1000,
                        control=list(adapt_delta=0.9, max_treedepth=12))

you can check on the results with

params <- rstan::extract(fit1)
bayesplot::mcmc_dens(fit1, pars = c("b1", "b3", "b4"), facet_args = list(nrow = 3))
bayesplot::mcmc_dens(fit1, pars = c("xi"), facet_args = list(nrow = 1))
bayesplot::mcmc_dens(fit1, pars = paste0("b2[", 1:5,"]"), facet_args = list(nrow = 5))
20 + b

and

xnew <- seq(20, 40, by = 1)
mu <- lapply(1:Nc, function(i){
  sapply(seq_along(xnew), function(j){
    sigemax(xnew[j], params$b1, params$b2[, i], params$b3, params$b4)
    })
  })
lwr <- sapply(mu, function(z){
  apply(z, 2, function(w) quantile(w, probs = c(0.1)))
})
upr <- sapply(mu, function(z){
  apply(z, 2, function(w) quantile(w, probs = c(0.9)))
})
mumu <- sapply(mu, function(w){colMeans(w)})

plot(1, xlab = "x", ylab = "mu", xlim = range(xnew), ylim = range(mumu))
for(i in 1:Nc){
  lines(xnew, mumu[, i], col = i) # scales::alpha("black", 0.7))
  lines(xnew, lwr[, i], col = i, lty = 2)
  lines(xnew, upr[, i], col = i, lty = 2)
}

which seem at least somewhat reasonable and I didn’t have any issues raised by stan. However, I will say that I am aware that these models are quite sensitive to the priors and so I have made them relatively informative here based on the description of your study.

I’d also add that I have found dynamic linear models (DLMs) to be particularly useful for dose-response modelling as they essentially offer smoothing without enforcing a parametric form. Scott Berry outlines some basic DLMs in his 2011 text on Bayesian adaptive clinical trials.

2 Likes

Thanks @maj684 for your suggestions, I will definitely take a look at these options!