Dose-response fitting with small sample size

Dear Stan community,

My question is on how to deal with small sample sizes when fitting dose-response curves.

Consider a dataset with two simulated samples (red and blue below) applied equally to three blocks BUT with some degree of block to block variation in the horizontal movement of the curve (i.e. the curve is very similar but shifted differently to the right or to the left depending on the block it is in):

This would be called a Randomized Complete Block Design, and we try to fit these data using a four-parameter logistic:

y_{i,s,b} = \frac{A}{1+2^{-B(x_i - C_{s,b})}}+D+\epsilon_i

Where i indexes the observations, s the samples (1 and 2) and b the blocks (1, 2 and 3). In addition A is the range (fixed at 1), B is the slope (fixed at 1), D is the no-dose asymptote (fixed at 0.2) and \epsilon_i \sim \mathcal{N}(0,0.01) (which is very small). The parameters of interest are C_{s,b} (EC50), the response at 50% dose; and \sigma_{logCBlock}, the block to block variation for that parameter, described below:

C_{s,b} = logC_s + \zeta_b

where logC_s is the sample-specific EC50 and the \zeta_b is the block-specific “noise”:

logC_s \sim \mathcal{N}(\mu_{logC},\sigma_{logC}), \mu_{logC}=0 and \sigma_{logC}=1
\zeta_b \sim \mathcal{N}(0,\sigma_{LogCblock}),

with the hyperprior:
\sigma_{LogCblock} \sim \mathcal{N}(a,b), for this example, a=3 and b=2.

I am trying to fit this but get multiple divergent transitions unless the \sigma_{LogCblock} prior is concentrated around the true value of \sigma_{LogCblock}. For example, the simulated data for the reproducible example below uses \sigma_{LogCblock}=1.5, and the inference would result in the following pairs plot:

Below is a reproducible example:


modelCode = 'functions {
  real fplBPosRange(real logX, real logBase, real A, real B, real logC, real D) {
    return A/(1+logBase^(-B*(logX-logC)))+D;
  }
}

data {
  int<lower=1> N;                    // size of dataset
  vector[N] logX;                    // dose value in logarithmic scale
  real y[N];                         // Response values
  int<lower=1> nSamples;            // number of samples
  int<lower=1,upper=nSamples> sample[N];       // sample
  int<lower=1> nBlocks;              // number of blocks
  int<lower=1,upper=nBlocks> cellBlock[N]; //block
  real meanLogCLocationPrior;
  real meanLogCScalePrior;
  real blockSigmaLogCLocationPrior;
  real blockSigmaLogCScalePrior;
}

parameters {
  real meanLogC[nSamples];
  real<lower=-blockSigmaLogCLocationPrior/blockSigmaLogCScalePrior> blockSigmaLogC_raw;
  vector[nBlocks] blockNoiseLogC_raw;
}

transformed parameters {
  vector[N] logC;
  real blockSigmaLogC;
  vector[nBlocks] blockNoiseLogC;

  blockSigmaLogC = blockSigmaLogCLocationPrior + blockSigmaLogC_raw*blockSigmaLogCScalePrior;

  for (b in 1:nBlocks) {
    blockNoiseLogC[b] = blockNoiseLogC_raw[b]*blockSigmaLogC;
  }

  for (i in 1:N) {
      logC[i] = meanLogC[sample[i]] + blockNoiseLogC[cellBlock[i]];
  }
}

model {

  vector[N] mu_y;

  blockSigmaLogC_raw ~ normal(0,1);
  blockNoiseLogC_raw ~ normal(0,1);
  
  meanLogC ~ normal(meanLogCLocationPrior,meanLogCScalePrior);

  for (i in 1:N) {
      mu_y[i] = fplBPosRange(logX[i],
                           2,
                           1,
                           1,
                           logC[i],
                           0.2);
   };
  y ~ normal(mu_y,0.01);
}'

data <- data.frame(log2Dose=c(-13.83,-13.83,-13.83,-13.83,-13.83,-13.83,-11.31,-11.31,-11.31,-11.31,-11.31,-11.31,-8.8,-8.8,-8.8,-8.8,-8.8,-8.8,-6.29,-6.29,-6.29,-6.29,-6.29,-6.29,-3.77,-3.77,-3.77,-3.77,-3.77,-3.77,-1.26,-1.26,-1.26,-1.26,-1.26,-1.26,1.26,1.26,1.26,1.26,1.26,1.26,3.77,3.77,3.77,3.77,3.77,3.77,6.29,6.29,6.29,6.29,6.29,6.29,8.8,8.8,8.8,8.8,8.8,8.8,11.31,11.31,11.31,11.31,11.31,11.31,13.83,13.83,13.83,13.83,13.83,13.83),y=c(0.21,0.2,0.21,0.2,0.2,0.2,0.21,0.2,0.21,0.19,0.2,0.21,0.2,0.21,0.23,0.21,0.2,0.21,0.23,0.21,0.25,0.22,0.23,0.24,0.33,0.27,0.41,0.34,0.3,0.46,0.64,0.47,0.82,0.7,0.61,0.86,1.02,0.87,1.11,1.05,1,1.11,1.16,1.11,1.19,1.17,1.14,1.17,1.22,1.18,1.2,1.22,1.2,1.2,1.19,1.2,1.22,1.19,1.2,1.2,1.22,1.22,1.18,1.21,1.2,1.2,1.2,1.18,1.19,1.19,1.2,1.18),sample=c(1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2,1,1,2,1,2,2),cellBlock=c(1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3,1,2,1,3,2,3))

library(rstan)
seed <- 1
iterations <- 8000
stanFit <- rstan::stan(
    model_code = modelCode,
    data = list(N=nrow(data),
                logX=data$log2Dose,
                y=data$y,
                nSamples=2,
                sample=data$sample,
                nBlocks=3,
                cellBlock=data$cellBlock,
                meanLogCLocationPrior=0,
                meanLogCScalePrior=1,
                blockSigmaLogCLocationPrior=3,
                blockSigmaLogCScalePrior=2),    # named list of data
    chains = 4,             # number of Markov chains
    warmup = round(iterations/2),          # number of warmup iterations per chain
    iter = iterations,            # total number of iterations per chain
    cores = 4,              # number of cores (could use one per chain)
    refresh = 0,           # no progress shown
    seed = seed,
    pars=c("meanLogC","blockSigmaLogC"),
    save_warmup=FALSE,
    control=list(adapt_delta=0.9)
  )

I have two questions:

  1. The divergences are very likely due to the small number of blocks. Three blocks is perhaps too little in order to estimate the block to block variation \sigma_{LogCblock} (blockSigmaLogC in the pairs plot). The lp__, blockSigmaLogC square on the pairs plot shows several values for which the block to block variation is close to highest, which probably troubles the sampler. If I simulated data with more blocks this gets fixed. I think I either need to use more blocks or have a stronger prior. Would people agree with this or is there anything else that I can do? I just ask as in my experience in the lab, these dataset sizes are (perhaps unfortunately), typical so I trying to figure out strategies to deal with them.
  2. (Hopefully an easy question): As you can see the two sample parameters, meanLogC[1] and meanLogC[2] are correlated (LogC_1 and LogC_2 in the equation), and there can be sometimes 8 samples on each block, do you have any suggestions or examples about how to separate this correlation?

Thank you in advance for any suggestions.

2 Likes

The expression A/(1+logBase^(-B*(logX-logC)))+D is the model itself, right? Can you unpack what the parameters mean?

There are a lot of models that will produce the characteristic sigmoidal dose-response shape, some are grounded on the processes that generate the response, others just on being flexible enough to produce the shape; they will be more or less easy to fit to your data, and this depends also on what you are trying to do.

We fit some simple exponential and heterogeneous (beta-binomial-like) models to dose response data using a Metropolis-Hastings sampler and later in response to some criticism reproduced and expanded the original work using Stan and didn’t have trouble.

Can’t you fit the blocks separately and see if you have trouble only in certain subsets of the data? Maybe they don’t have to be fit together, or can have separate parameters for each of them, or maybe there is indeed to little data in some of them and they need to be excluded (or maybe they can stay if you verify that the rest behaves the same as when you exclude the small sample size).

2 Likes

Thank you very much for your response @caesoma . To answer your question, A is the range, B the slope (specific for each block b ), logC the logEC50, and D the no-dose asymptote. The problem is not really in fitting the curves individually, they are actually pretty well behaved and I imagine many models will fit it. The challenge is in estimating the block-to-block variation for the logEC50, which I think is giving problems as I only have three blocks.

Without reading your paper in full detail but going through the intro and some graphs, I see that, for example, the curves in Vietnam and Brazil are different in their EC50: the dose at which 50% of the population is infected (figure 1.a). What I want is analogous to wanting to determine an estimate of the variation of this EC50 measurement (or some of the other parameters on the curve function used) between the populations by running a model that uses all populations at once. What I am thinking is that for this to work in Stan, you (I) would need more than two populations or have a stronger prior.

It’s not completely clear to me if block and population refer to the same thing here, but I if you have enough data points in each block/population (whether some parameters are shared or not) it shouldn’t be a problem that there are only three – then you should be able to compare the posteriors of parameters that differ between blocks.

If for any one block you have too few data points, then maybe you will have trouble and get divergent transitions, but that’s a within-block issue, and I believe you are right that you will get rid of the problem by getting rid of the data (and also of the result). Conversely, if you try to fit that block alone you will probably run into the same problem.

In sum: make sure you can fit each curve separately, if one or more of them doesn’t have enough data, there’s not much to do about that except get more data.

2 Likes

Thank you again very much @caesoma. There is no problem fitting the data within each block (I actually simulated it to be pretty easy to fit!), the problem is something else and I see that perhaps I haven’t explained myself very well. Technically speaking, this would be a randomized complete block design where we have an overall mean for the logC of each sample, but also block-specific variation. Unfortunately, I won’t have time to write a better response until tomorrow afternoon, so I will do so then - I should have written the model in my question. Again, thank you very much for taking the time to help someone in need.

I described the problem better in the description, reading it again I see that it’s hard to understand. Will give it another pass tomorrow. Thanks again @caesoma.

Hi,
usually in those cases it is useful to try to isolate the problem by simplifying the model - i.e. does the model work if you have just one block and a single value for C? (it seems your further reply indicates it does) Does the model work if you omit the sigmoid and just assume you have direct noisy observations of C_{s,b}?

Another possibility is that the problem is in the hard constraint:

Hard constraints on parameters like this can pose problems. If you want to express the fact that blockSigmaLogC is unlikely to be very low, it is better to express this as a “soft constraint” by your prior. Hard constraints are usually better reserved for “physical” constraints (e.g. concentration can’t be negative, etc.)

I admit I don’t really see how that arises from the model, but as a heuristic, you could probably reparametrize via something like “overall mean of logC” and “N - 1 differences from mean”, similarly to how you would do in effect coding.

OIne other minor weird thing (likely not a big problem) is:

Since it appears that y cannot be negative, it might make more sense to have a lognormal or gamma observation model.

I assume you plan to estimate other parameters then C at some point. If yes, then I’ll just add that flexible sigmoid models can pose a bunch of challenges on their own. But those tend to manifest primarily when your data do not cover the whole dynamic range, which appears to not be the case in your data.

If yes, then here is a bunch of links to reparametrizations (all of them still have some problems and I am not completely happy about any of those):

I also wrote about some of the considerations at Hierarchical Gompertz model - #77 by martinmodrak

Best of luck with your model!

1 Like

Thanks so much @martinmodrak, will go through your response carefully.

Dear @martinmodrak , just went carefully through all the links. Lots to learn!! I can definitely fit the data on the individual blocks without any issues.

The purpose of this constraint was to accomodate for a non-centered parameterization of the standard deviation hyper-parameter of a normal distribution. I did try the gamma and the same problem still appeared. Why would the hard constraint pose a problem?

I think the correlation between the meanLogC arise in the model due to the way I constructed the data, so no problem there, but i will keep in mind the effect coding stuff.

Other than that, I got a lot out of the other things you mentioned on parametrizations of the sigmoid.

In this case, I am not sure there is anything else to do other than more data or stronger priors. Will continue to work on this. Thank you again very much!

1 Like

For the future, in case it is helpful for anyone else, after doing some reading and experimenting, this particular dataset was was fit by adjusting the prior parameters on \sigma_{LogCblock} \sim \mathcal{N}(a,b) to a=1.5 and b=1.5 (As a reminder, it was previously set to a=3 and b=2 ). My “beginner’s understanding” is that the previous prior was making the sampler explore areas for which there was no data available, (I wish I could read that in the graphs though!) Hence it was having issues - it seems that the space is being explored better now, below is the new pairs graph:

Thanks @martinmodrak and @caesoma for the responses. In summary, this was a case where the individual sigmoid curves were fit very easily separately, but a more informative/suitable prior was needed in order to fit a model of all the data at once that incorporated block to block variation and obtain an inference for such block-to-block variation parameters (i.e. \sigma_{LogCblock}).