I have no experience with Stan, but it was suggested that I look into Stan for performing “error-in-variables” + “response-uncertainty” inferences. I’m well-versed in Mathematica and MATLAB, and am learning Python. I have no practical experience with R. Honestly, the main reason I included R as an option in my SE question is that I was OK with using one of the other languages to call an R function. I’ve briefly extensively perused other posts on discourse, but all-in-all it’s seeming somewhat distant/arduous to get something like what’s described in my linked post:
mdl = some_function(X, Xsd, y, ysd)
ypred, ysd2, cov = mdl(X2)
Right now, it’s not worth it for me to learn two unfamiliar languages to be able to do this, but if I had some Stan
code (generated via e.g. brms
) that I could copy-paste to use in MathematicaStan
, MatlabStan
, or PyStan
and see that it’s working, I’d be a lot more interested in figuring out how to customize it later.
This seems pretty relevant from 15 Missing Data and Other Opportunities | Statistical rethinking with brms, ggplot2, and the tidyverse: Second edition.
# put the data into a `list()`
dlist <- list(
D_obs = d$D_obs,
D_sd = d$D_sd,
M_obs = d$M_obs,
M_sd = d$M_sd,
A = d$A)
b15.2 <-
brm(data = dlist,
family = gaussian,
D_obs | mi(D_sd) ~ 1 + A + me(M_obs, M_sd),
prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(normal(0, 1), class = meanme),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, cores = 4, chains = 4,
seed = 15,
# note this line
save_mevars = TRUE,
file = "fits/b15.02")
I understand that mi
allows for error in the responses, and me
allows for errors in the predictors. It’s not exactly clear to me how this line was chosen: D_obs | mi(D_sd) ~ 1 + A + me(M_obs, M_sd)
Also, I’m not sure what c
is or what the specification for the prior is.
There is also this example from the docs:
data {
...
real x_meas[N]; // measurement of x
real<lower=0> tau; // measurement noise
}
parameters {
real x[N]; // unknown true value
real mu_x; // prior location
real sigma_x; // prior scale
...
}
model {
x ~ normal(mu_x, sigma_x); // prior
x_meas ~ normal(x, tau); // measurement model
y ~ normal(alpha + beta * x, sigma);
...
}
Finally, an example of Stan
code for Gaussian process regression is given in the docs:
data {
int<lower=1> N;
real x[N];
vector[N] y;
}
transformed data {
vector[N] mu = rep_vector(0, N);
}
parameters {
real<lower=0> rho;
real<lower=0> alpha;
real<lower=0> sigma;
}
model {
matrix[N, N] L_K;
matrix[N, N] K = cov_exp_quad(x, alpha, rho);
real sq_sigma = square(sigma);
// diagonal elements
for (n in 1:N)
K[n, n] = K[n, n] + sq_sigma;
L_K = cholesky_decompose(K);
rho ~ inv_gamma(5, 5);
alpha ~ std_normal();
sigma ~ std_normal();
y ~ multi_normal_cholesky(mu, L_K);
}
I asked for a GPR example using brms
in another question. Hopefully I can pick apart the non-parametric multidimensional GPR and the predictor/response uncertainty syntax to be able to combine both of them seamlessly.
Can anyone get me started off (or even better, post an example of Stan
and/or brms
code that does non-parametric multidimensional regression with predictor & response measurement error)?