Help in identifying strategies for fitting simulated datasets

ramiro · April 28, 2021, 5:37pm

Summary: Below I include details about an example model and how to fit it, it is just an example to illustrate a couple of points. My main question is at the end. I would like to generate simulated datasets of the same function but with various levels of the parameters ( for example, generate dose response curves with various levels of variation in the slope parameter), but it seems like it is not feasible to have a single rstan model that could handle all of them. My question is what strategies do people use when trying to do simulation studies using stan? I put a couple of ideas at the end.

Example Model
I have a general question that came after having a lot of problems fitting one of the parameters in a non-linear model. I will simplify the model as much as possible and only consider the troublesome parameter. I am fitting a four parameter logistic function such that:

y_{i,k} = \frac{A}{1+2^{-(\bar{B}+B_k)*(x_i-C)}}+D+\epsilon_i

Where A is the range, B the slope (specific for each block b), C the logEC50, and D the no-dose asymptote.

For this example I will only try to fit the B parameter. The centered model would be:

\begin{align} \bar{B} &\sim \mathcal{N}(1,0.1) \quad \bar{B} \geq 0 \\ \sigma_B &\sim\mathcal{N}(0.2,0.2), \quad \sigma_B \geq 0 \\ B_k &\sim \mathcal{N}(0,\sigma_B) \\ \sigma &\sim \mathcal{N}(0,1), \quad \sigma \geq 0 \end{align}

and the corresponding non-centered model would be:

\begin{align} \bar{B}_{raw} &\sim \mathcal{N}(0,1) \quad \bar{B}_{raw} \geq -10 \\ \sigma_{B_{raw}} &\sim\mathcal{N}(0,1), \quad \sigma_B \geq -1 \\ B_{k_{raw}} &\sim \mathcal{N}(0,1) \\ \bar{B} &= 1+0.1*\bar{B}_{raw} \\ \sigma_B &= 0.2 + 0.2*\sigma_{B_{raw}} \\ B_k &= B_{k_{raw}}*\sigma_B \end{align}

This is some example data extracted from a simulated dataset:

data ← data.frame(log2Dose=c(-13.83,-13.83,-11.31,-11.31,-8.8,-8.8,-6.29,-6.29,-3.77,-3.77,-1.26,-1.26,1.26,1.26,3.77,3.77,6.29,6.29,8.8,8.8,11.31,11.31,13.83,13.83),
y=c(0.24,0.25,0.21,0.29,0.23,0.32,0.26,0.3,0.37,0.35,0.56,0.55,0.86,0.89,1.01,1.07,1.1,1.19,1.12,1.2,1.15,1.23,1.14,1.21),
block=c(2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1))

Which looks like this:

And this is the model:

model_code <- 'functions {
  real fplBPosRange(real logX, real logBase, real A, real B, real logC, real D) {
    return A/(1+logBase^(-B*(logX-logC)))+D;
  }
}

data {
  int<lower=1> N;                    // size of dataset
  vector[N] logX;                    // dose value in logarithmic scale
  real y[N];                         // Response values
  int<lower=1> nBlocks;              // number of blocks
  int<lower=1,upper=nBlocks> cultureBlock[N]; //block
}

parameters {
  real<lower=-1> sigmaB_raw;
  real<lower=-10> meanB_raw;
  vector[nBlocks] blockNoiseB_raw;

  real<lower=0> sigma;
}

transformed parameters {

  vector[N] B;
  vector[nBlocks] blockNoiseB;
  real meanB;
  real sigmaB;

  sigmaB = 0.2 + 0.2*sigmaB_raw;

  for (i in 1:nBlocks) {
    blockNoiseB[i] = blockNoiseB_raw[i]*sigmaB;
  }

  meanB = 1 + meanB_raw*0.1;
  for (i in 1:N) {
    B[i] = meanB + blockNoiseB[cultureBlock[i]];
  }
}

model {

  vector[N] mu_y;

  sigmaB_raw ~ normal(0,1);
  blockNoiseB_raw ~ normal(0,1);
  meanB_raw ~ normal(0,1);

  sigma ~ normal(0,1);
  for (i in 1:N) {
    mu_y[i] = fplBPosRange(logX[i],
                           2,
                           1,
                           B[i],
                           0.01,
                           0.07);
  };
  y ~ normal(mu_y,sigma);
}'

And it could run like this:

stanFit <- rstan::stan(
  model_code = model_code,
  data = list(N=nrow(data),
              logX=data$log2Dose,
              y=data$y,
              nBlocks=2,
              cultureBlock=data$block),
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 5000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 0,           # no progress shown
  seed = 1,
  pars=c("sigma","meanB", "sigmaB","blockNoiseB"),
  save_warmup=FALSE,
  control=list(adapt_delta=0.8)
)

Following guidelines for investigating funnels etc. from @betanalpha writing I can get this to work by either constraing the prior or increase adapt_delta. In this case I can constrain the prior on sigmaB because I know that the true value is 0.2. Illustrations are below showing log(sigmaB) as a function of meanB - with the red dots corresponding to the divergent transitions - for various parameterizations and levels of adapt_delta:

QUESTION
The question I have is, from my reading of @betanalpha 's document there are tradeoffs depending on how informative the likelihood is, how much data we have, and the presence of an infinity-supressing prior. We need to adjust depending on our dataset. My problem is that I want to do some simulation studies with many datasets in order to study some particular behaviors. What strategies do people use in this case? If you want to do simulation studies with various designs (e.g. create and fit simulated datasets at various levels of the sigmaB parameter above) it seems to me that a single “one-size fits all” stan model would not work. In addition sometimes the simulated dataset might be generated from a true parameter very close to zero, hence requiring a non-centered parameterization, or the other way around (high information parameter - more informed likelihood - centered parameterization more suitable). I can think of the following strategies:

Have several models available (centered, non-centered) and for each dataset one of those will hopefully work.
Use my knowledge of the true values of the simulated dataset to inform and constrain the prior, this sounds a bit like I could bias the results to the true answer but I am not quite sure what else to do.

Any other suggestions?

martinmodrak · May 4, 2021, 9:35am

Hi,
sorry for not getting to you earlier, your question is relevant and well written.

Sigmoids are generally very hard to fit, I’ve spent substantial time working with them and I still have no idea how to fit them reliably in all cases. I would expect a lot of the issues to have to do with the sigmoid part and less with the centering/non-centering.

Some discussion of problems with sigmoids can be found at:

Identifying non-identifiability
Dose Response Model with partial pooling on maximum value - #3 by martinmodrak
There was a cool presentation at the online StanCon 2020 about fitting growth curves to Icelandic Covid data, the Q&A is in a separate video. he main idea was to fit the derivative of the curve to differences between consecutive data points instead of fitting the sigmoid directly. I never tried this myself, but I can see how it could sometimes make your life easier.

I don’t have a good answer to your general question. I think the fact that your model is able to fit only some simulated datasets is often an interesting information not to be thrown away. However , you often don’t believe your real datasets have as much variability as the simulated ones so finding better prior or other ways to only simulate datasets that you would care about is often helpful. As a last resort I would try to develop heuristics to pick a model based on the data.

Best of luck with your model

martinmodrak · May 4, 2021, 3:21pm

Oh, I noticed one more thing - in the example, you have only two blocks. It is often problematic to fit hierarchies where you only have two elements in a level and you likely won’t get much information on the sigmaB anyway. So if 2 blocks is a scenario you want to handle, I would just make the noise independent and fix sigmaB to a constant (for this scenario). If your real data always have more blocks, then I would simulate 4 or more blocks.

betanalpha · May 4, 2021, 3:46pm

Unfortunately there’s a lot of misunderstanding about the role of a Stan program and its place within simulation studies that sets up the poor expectations with which you’re starting to struggle.

Let’s consider a Bayesian calibration study, where we investigate the posterior inferences derived from an ensemble of prior predictive data,

\begin{align*} \tilde{\theta}_{n} &\sim \pi(\theta) \\ \tilde{y}_{n} &\sim \pi(y \mid \tilde{\theta}_{n}) \\ &\pi(\theta \mid \tilde{y}_{n}). \end{align*}

Even if the true model configuration \tilde{\theta} is fixed the behavior of the realized likelihood function, and hence the subsequent posterior density function, can vary strongly from simulated observation to simulated observation. The range of possible behaviors varies even more strongly when \tilde{\theta} varies across the prior model. The accurate expectation is that when working with simulation studies the realized posterior density functions will vary strongly in their behaviors!

One immediate consequence of this heterogeneity in posterior density function behaviors is that the optimal way to implement those density functions in Stan many also vary. Each posterior density function may require different parameterizations in order to facilitate accuracy computation with Stan depending on how informative the realized likelihood function happens to be. Unfortunately there’s no way around this, although heuristics for automatically identifying poor initial parameterizations can help minimize the burden.

What one doesn’t want to do is change the prior model heterogeneously. If different prior models are used for each posterior fit then the posteriors themselves won’t really be comparable in any well-defined way. On the other hand if a prior predictive simulation study reveals extreme behavior that conflicts with ones domain expertise then that can motivate a more informative prior model that can then be used to simulate new data and fit each realized posterior distribution. This not only ensures less heterogeneity in the realized likelihood functions (narrower prior model means less variation in the simulated truths \tilde{\theta} which means less variation in the simulated data \tilde{y} which means less variation in the realized likelihood functions) but also reduces how many heterogeneity in the realized likelihood functions propagates to the realized posterior density functions.

Another possibility is to change your experimental design. Larger data sets, if feasible, can lead to more concentrated likelihood functions that vary less from simulation to simulation. Similarly introducing auxiliary data from external sources can help compensate for degenerate behavior and lead to much more homogenous likelihood behavior.

If sufficient domain expertise isn’t available or modified experimental design isn’t possible, however, then you’ll have to confront how to accurately fit the wide range of possible posterior behaviors. Sometimes this can be accommodated with enough careful reparameterizations, but sometimes it just means that you don’t have the computational tool to explore the full range of behaviors of interest.

For more discussion see Identity Crisis, Towards A Principled Bayesian Workflow, and Towards A Principled Bayesian Workflow.

ramiro · May 4, 2021, 6:01pm

Thank you @martinmodrak , indeed my simulations will have more than 2 blocks but I just included two here as to make smaller self-contained example. Thank you very much for your reply. Will read it carefully, as well as your references.

ramiro · May 4, 2021, 7:23pm

Thank you @betanalpha , this clarifies a lot of things for me. I will go through the case studies.

Topic		Replies	Views
Dose-response fitting with small sample size Modeling rstan , fitting-issues	9	1706	February 9, 2022
Dose Response Model with partial pooling on maximum value Modeling fitting-issues	14	1909	September 1, 2020
Request for help understanding model behavior Modeling	17	1536	March 24, 2018
Correlation between bias and true parameter values Modeling	6	1525	July 9, 2020
Dose-response model with partial pooling Modeling fitting-issues , pharmacology , hierarchical-model	23	2865	April 14, 2020

Help in identifying strategies for fitting simulated datasets

Related topics