I would like to use stan to create and use a calibration curve. The word “calibration” as I use it here reflects its meaning in analytical chemistry, not statistics.
A calibration curve consists of two parts

A regression model (usually linear, often a line with parameters for the slope and intercept). Typically we have standard solutions with known x values (e.g. concentrations) for which we have measured a set of corresponding y values (intensities). We estimate alpha (the intercept) and beta (the slope) . No problems there.

Next we measure the intensities (y values) of several unknowns for which we wish to know the concentration (x value), and suppose that each unknown is measured in triplicate.
How can I use stan to estimate the posterior distributions of the concentrations (x values) corresponding to each unknown sample?
Here is my stan code so far, I’ve detailed my questions inline in the comments
data {
// variables from the calibration curve data are prefixed with cal_
int<lower=1> cal_N; // number of standards
vector[cal_N] cal_meas; // measured intensities of the standards
vector[cal_N] cal_known; // known values of concentrations of the standards
// Variables to do socalled inverse prediction: predict x values from y values
int<lower=1> meas_N; // number of unknown intensity measurements
vector[meas_N] y_meas; // measured intensity values of the unknowns
// I also need to be able to do ordinary forward prediction:
// given new x values predict the y's
int<lower=1> new_x_N;
vector[new_x_N] new_x;
}
parameters {
real alpha;
real<lower=0> beta;
real<lower=0> sigma;
}
model {
// Step one: get the calibration curve
cal_meas ~ normal(alpha + beta * cal_known, sigma);
sigma ~ exponential(1); // standard deviation is positive
// I know the slope is about 750 because that's the way I made the data
// There's a better prior to be had, but that's a different question of the day
beta ~ normal(750, 100);
}
generated quantities {
vector[new_x_N] new_y;
vector[meas_N] x_inv_pred;
real<lower=0> sigma_x;
//
// Q1: How can I get stan to estimate the value of the scale factor for the x values?
// A poor solution:
// Inverse predict the value of the standard deviation of x from the value we found for sigma
// sigma_x = (sigma  alpha)/beta
// This won't work in general, because the error on the measurement variable could be much less
// than the error on the intercept and then sigma_x would be negative
// Another poor solution:
// Pretend the intercept doesn't matter and divide by the slope
sigma_x = sigma/beta;
//
//
// Q2: How to predict x values from my new y values?
// An unsatisfactory method
// sigma_x is not well estimated
// assuming normal distribution for x valuesI'd like stan to figure out what the distribution is
// errors in new y measurements are not taken into account
for (n in 1:meas_N) {
// Assume that x values predicted by the inverted equation
// are normally distributed, because don't know how to specify
// their actual distribution as found in the posterior
x_inv_pred[n] = normal_rng((y_meas[n]  alpha) / beta, sigma_x);
}
//
// predict y's from new x's
for (n in 1:new_x_N) {
new_y[n] = normal_rng(alpha + beta * new_x[n], sigma);
}
}
The data represents notional measurements of molybdenum concentrations and is taken from Hunter, 1984 (https://doi.org/10.1007/9789401710268_3), but I made the problem more realistic by multiplying the measured y values by 747, as if they represented values obtained from an instrument. For the sake of this problem, assume the measured values are intensities, and the known values are concentrations in parts per billion (ppb)
Mb_data.csv (263 Bytes)
I’m working in R, using rstan.
Let’s say I have one new concentration to estimate, and I’ve recorded the following triplicate measurement, and I pass that data to the stan model when sampling, along with the calibration data:
new_Mb_data < data.frame(y_meas = c(850, 867, 848))
d < list(cal_meas = Mb_data$meas, cal_known = Mb_data$known, cal_N = nrow(Mb_data),
meas_N = nrow(new_Mb_data), y_meas = new_Mb_data$y_meas,
new_x_N = 2, new_x = c(0, 1))
# compile the model
comp_cal_model < stan_model("calibration_regression_predict_x_and_y_Qoftheday.stan")
# sample
fit_model_cal < sampling(comp_cal_model, data = d, chains = 4, seed = "123" )
# print the results
fit_model_cal
To repeat the questions in the stan code,
Q1: How can I get stan to estimate the value of the scale factor for the x values?
and
Q2: How to obtain a posterior distribution of x values from my new y values, taking into account the error in the new y measurements, a better estimate of sigma_x, and the fitted calibration curve?
Q3: I’ve read that the items in the generated quantiles block are calculated every time a sample is performed, and I think that means that those values are effectively using the posterior distribution of the parameters alpha and beta, is that correct?
Three final comments
 I think that this problem is a tiny example of “the inverse problem”, but the papers I’ve found that deal with that subject use models that are way more complicated and beyond my current comprehension.
 The real model I’m working towards will take into account nonnegligible errors in the known concentrations used to generate the initial regression, heteroscedascity, will have a better prior for the slope, and nonlinear or spline regression, and perhaps pool data from multiple curves performed on the same instrument for different analytes…
 Another way of solving this problem would be to use the posterior of the regression as the prior for a new, inverse regression model. I’ve seen a paper that advocates using quantiles of a posterior as priors for a future model. Sorry, can’t find the exact reference at the moment.
Thanks