My question is an extension of section 6.1 in the Stan User’s Guide.
I have a model of the form y ~ x, where x is defined as x = a / b. The variable a is some lab-based measurement and b is the mass of the sample on which it’s measured. Both a and b are measured in the lab with analytical repeats. Typically, these repeats are averaged together and a is divided by b to generate x before generating a ‘final’ data set that is used for modeling. This obviously ignores any measurement error associated with a and b.
What I would like to do is pass the full raw data to Stan for both a and b (a_meas and b_meas) which would have a length 3x the length of y because they’re measured with 3 repeats per sample. I’d like to estimate separate parameters for each sample of a, where the parameter’s mean is the mean of the three repeats repeats and the sigma is the sigma of the three measured repeats. I want to do the same for b. And then divide the two to generate x, a vector of parameters, that represent the “true value” of x, accounting for this measurement error.
In the example in the User’s Guide, the following syntax was used:
data {
int<lower=0> N; // number of cases
vector[N] y; // outcome (variate)
real x_meas[N]; // measurement of x (co-variate)
real<lower=0> tau; // measurement noise
}
parameters {
real x[N]; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}
model {
x ~ normal(mu_x, sigma_x); // prior
x_meas ~ normal(x, tau); // measurement model
y ~ normal(alpha + beta * x, sigma);
}
This example doesn’t apply to my case because there’s a fixed error across all of the samples and the mean is only drawn from one observation, the single measured value.
I’m having trouble figuring out how to modify this to solve my problem. I’m imagining something like:
data {
int<lower=0> N; // number of cases in y
int<lower=0> J; // number of cases in a and b, including replicates
int<lower=0> R; // number of replicates per sample
vector[N] y; // outcome (variate)
real a_meas[J]; // measurement of a (co-variate)
real b_meas[J]; // measurement of b (co-variate)
}
parameters {
real a[N]; // unknown true value for a
real mu_a[N]; // prior location
real sigma_a[N]; // prior scale
real b[N]; // unknown true value for b
real mu_b[N]; // prior location
real sigma_b[N]; // prior scale
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // outcome noise
}
generated quantities {
// Is this the right place to generate x,
// which would be the parameters a[N] and b[N] divided by each other?
real x[N] = a / b;
}
model {
// I'm not sure how to properly specify the model section to
// general parameters for a and b based on the three
// measurements of a and b for every sample
for(i in 1:N){
for(i in 1:R){
a ~ normal(mu_a, sigma_a);
}
}
for(i in 1:N){
for(i in 1:R){
b ~ normal(mu_b, sigma_b);
}
}
y ~ normal(alpha + beta * x, sigma);
}
Thanks for any help and direction you can offer.