Hello All,
I have been trying to do a simulation study of a two-stage model on drug interaction using Stan.
Here is the issue: while I can obtain reasonable results from the first stage, the estimate of the interaction term (which is estimated in the second stage) is biased most of the time. Sometimes (this happens to some parameter values chosen for the interaction term) when the interaction term is correctly estimated, some other parameter estimations turn out to be biased. (Kind of like a seesaw). Under both cases, though biasedness presents, the variance of the estimated parameters are sufficiently low.
The traceplots, posterior densities, and autocorrelation plots all look good. I did a log_prob() diagnosis, and found the log posterior at the true value to be -29692.97. Whereas, the log posterior value at the means of the posterior distributions to be 501.
.
I suspect the problem can be in one (or more) of the following:
- Data generation process;
- Newton’s method root-solver within Stan (most likely one I think!);
- Jacobian transformations somewhere.
In essence, the model description is as the following.
Stage One
Two mono-therapies are given to individuals with drug levels d_i (i = 1, 2). The drug effects are modeled by two independent Emax models,
y_i = \frac{y_{max, i} \cdot (\frac{d_i}{D_{m,i}})^{m_i}}{1 + (\frac{d_i}{D_{m,i}})^{m_i}}.
Further,
\log(\frac{y}{y_{max, i} -y}) = \beta_{0,i} + \beta_{1,i}*\log(d_i) + \epsilon,
where \beta_{0,i} = -m_i*\log(D_{m,i}), \beta_{1,i} = m_i and \epsilon \sim \mathcal{N} (0, \sigma_i).
Stage Two
When combining drug A and B, there is often an interaction effect. In Zhao et al. (2012) (link: https://www.tandfonline.com/doi/abs/10.1080/19466315.2012.707087?journalCode=usbr20), they model the interaction term using a response surface, which is my ultimate goal. For now, to make the problem simple, I’m only using a single constant as my interaction term, \tau.
The interaction term is modeled as the following,
\frac{d_1}{\exp(-\frac{\beta_{0,1}}{\beta_{1,1}})(\frac{y}{y_{max,1}-y} )^{1/\beta_{1,1}}} + \frac{d_2}{\exp(-\frac{\beta_{0,2}}{\beta_{1,2}})(\frac{y}{y_{max,2}-y} )^{1/\beta_{1,2}}} = exp(\tau).
Moreover, we assume y_{max,1} = y_{max,2} = y_{max} and z = \log( \frac{y}{y_{max}-y}), so we can rewrite the above equation into,
\frac{d_1}{\exp(-\frac{\beta_{0,1}}{\beta_{1,1}})(z )^{1/\beta_{1,1}}} + \frac{d_2}{\exp(-\frac{\beta_{0,2}}{\beta_{1,2}})(z )^{1/\beta_{1,2}}} = exp(\tau).
Since it is an implicit function about z, we solve z numerically using a Newton’s method in Stan once all the parameters are sampled in one iteration. The solved value, \hat{z}, becomes the mean of the distribution of \log( \frac{y}{y_{max}-y}), expressed in the following relationship,
\log \frac{y}{y_{max} - y} = \hat{z} + e,
where e \sim \mathcal{N}(0, \sigma^2).
I include the data generation functions, parameter values used, function calls and the stan file. I know it is a more involved problem, so I’d truly appreciate any bit of help.
LaModel_help.stan (3.0 KB)
LaModelCall_help.R (6.1 KB)
LaModelFunc_help.R (6.4 KB)