I had some time to think about your model. I’m not quite sure I follow your thoughts in the follow-up posts, but your original model
y_\text{obs} = y_\text{mod} + (a + b y_\text{mod})\epsilon
is akin to a heteroskedastic regression model with
\begin{aligned}
y_{\text{obs},i} &= y_{\text{mod},i} + \eta_i\,,\quad\text{with}\\
\text{E}[\eta_i] &= 0\,,\\
\text{Var}[\eta_i] &= (a + b y_{\text{mod},i})^2\epsilon^2\,.
\end{aligned}
Before translating above model to a Stan model, let’s generate and plot some sample data first.
ymod <- 1:100
a <- 2.5
b <- 0.3
set.seed(2020)
yobs <- ymod + (a + b * ymod) * rnorm(length(ymod))
library(tidyverse)
data.frame(yobs = yobs, ymod = ymod) %>%
ggplot(aes(ymod, yobs)) +
geom_point() +
geom_line() +
theme_minimal()
The nice thing with Stan is that the model can be translated to Stan code pretty much as-is:
library(rstan)
model_code <- "
data {
int<lower=0> N;
vector[N] ymod;
vector[N] yobs;
}
parameters {
real<lower=0> a;
real<lower=0> b;
real<lower=0> sigma;
}
model {
a ~ lognormal(0, 1);
b ~ lognormal(0, 1);
sigma ~ cauchy(0, 2.5);
yobs ~ normal(ymod, (a + b * ymod) * sigma);
}
"
model <- stan_model(model_code = model_code)
I’ve chosen some weakly informative priors for all parameters; <lower=0>
for a and b ensures that the scale parameter is \in \mathbb{R}^+.
Let’s fit the model:
fit <- sampling(
model,
data = list(N = length(ymod), ymod = ymod, yobs = yobs),
seed = 2020)
We inspect parameter estimates and compare with the actual values
library(broom.mixed)
fit %>%
tidy(conf.int = TRUE, rhat = TRUE, ess = TRUE) %>%
bind_cols(value = c(a, b, 1))
## A tibble: 3 x 8
# term estimate std.error conf.low conf.high rhat ess value
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#1 a 2.31 2.47 0.631 9.99 1.01 578 2.5
#2 b 0.260 0.276 0.0754 1.07 1.01 372 0.3
#3 sigma 1.27 1.02 0.321 4.32 1.00 703 1
Not a bad fit, but note the wide (quantile) confidence/credible intervals.