Non-parametric, multidimensional regression with measurement error in predictors and responses

I have no experience with Stan, but it was suggested that I look into Stan for performing “error-in-variables” + “response-uncertainty” inferences. I’m well-versed in Mathematica and MATLAB, and am learning Python. I have no practical experience with R. Honestly, the main reason I included R as an option in my SE question is that I was OK with using one of the other languages to call an R function. I’ve briefly extensively perused other posts on discourse, but all-in-all it’s seeming somewhat distant/arduous to get something like what’s described in my linked post:

mdl = some_function(X, Xsd, y, ysd)
ypred, ysd2, cov = mdl(X2)

Right now, it’s not worth it for me to learn two unfamiliar languages to be able to do this, but if I had some Stan code (generated via e.g. brms) that I could copy-paste to use in MathematicaStan, MatlabStan, or PyStan and see that it’s working, I’d be a lot more interested in figuring out how to customize it later.

This seems pretty relevant from 15 Missing Data and Other Opportunities | Statistical rethinking with brms, ggplot2, and the tidyverse: Second edition.

# put the data into a `list()`
dlist <- list(
  D_obs = d$D_obs,
  D_sd  = d$D_sd,
  M_obs = d$M_obs,
  M_sd  = d$M_sd,
  A     = d$A)

b15.2 <- 
  brm(data = dlist, 
      family = gaussian,
      D_obs | mi(D_sd) ~ 1 + A + me(M_obs, M_sd),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 0.5), class = b),
                prior(normal(0, 1), class = meanme),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 15,
      # note this line
      save_mevars = TRUE,
      file = "fits/b15.02")

I understand that mi allows for error in the responses, and me allows for errors in the predictors. It’s not exactly clear to me how this line was chosen: D_obs | mi(D_sd) ~ 1 + A + me(M_obs, M_sd)
Also, I’m not sure what c is or what the specification for the prior is.

There is also this example from the docs:

data {
  real x_meas[N];     // measurement of x
  real<lower=0> tau;  // measurement noise
parameters {
  real x[N];          // unknown true value
  real mu_x;          // prior location
  real sigma_x;       // prior scale
model {
  x ~ normal(mu_x, sigma_x);  // prior
  x_meas ~ normal(x, tau);    // measurement model
  y ~ normal(alpha + beta * x, sigma);

Finally, an example of Stan code for Gaussian process regression is given in the docs:

data {
  int<lower=1> N;
  real x[N];
  vector[N] y;
transformed data {
  vector[N] mu = rep_vector(0, N);
parameters {
  real<lower=0> rho;
  real<lower=0> alpha;
  real<lower=0> sigma;
model {
  matrix[N, N] L_K;
  matrix[N, N] K = cov_exp_quad(x, alpha, rho);
  real sq_sigma = square(sigma);

  // diagonal elements
  for (n in 1:N)
    K[n, n] = K[n, n] + sq_sigma;

  L_K = cholesky_decompose(K);

  rho ~ inv_gamma(5, 5);
  alpha ~ std_normal();
  sigma ~ std_normal();

  y ~ multi_normal_cholesky(mu, L_K);

I asked for a GPR example using brms in another question. Hopefully I can pick apart the non-parametric multidimensional GPR and the predictor/response uncertainty syntax to be able to combine both of them seamlessly.

Can anyone get me started off (or even better, post an example of Stan and/or brms code that does non-parametric multidimensional regression with predictor & response measurement error)?

SE Post: What functions/packages in Python, MATLAB, Mathematica, or R implement "error-in-variables" and "response uncertainty" for non-linear regression? - Cross Validated


I’d get the simplest version of the model up and running first. So the measurement error from the Stan docs. Typically my workflow goes like this:

  1. Code of the simplest model in Stan or failing that use brms and dump the Stan code out from there.
  2. Simulated some data with known parameters.
  3. Run the simulated data through the model and check to make sure everything is working.
  4. Add one layer of complexity to the model (like multi-level or gaussian process) and repeat the run with the simulated data.