# Non-centered parameterization of measurement error model

Hey all,

I had a quick question about the measurement error model from section 6.1 of the Stan users guide. I’m trying to implement it in a logistic regression setting as follows using a non-centered parametrization for the measurement hierarchy. In my case, I have known standard deviations for each predictor variable, which differs from the constant measurement error assumed in the example.

data {
int <lower = 1> N; //number of data points
vector[N] x1_obs; //observed values
vector[N] x1_sd; //known sd
vector[N] x2_obs; //observed values
vector[N] x2_sd; //known sd
int<lower=0,upper=1> y[N];
}

parameters {
vector[N] x1_true_raw;
vector[N] x2_true_raw;
real x1_pmu; // population hyperparameters
real x2_psd;
real x1_pmu;
real x1_psd;
real b0;
real b1;
real b2;

}

transformed parameters{
vector[N] x1_true;
vector[N] x2_true;
x1_true = x1_pmu + x1_psd * x1_true_raw;
x2_true = x2_pmu + x2_psd * x2_true_raw;

}
model {
x1_true_raw ~ normal(0,1);
x2_true_raw ~ normal(0,1);
x1_obs ~ normal(x1_true, x1_sd); // measurement error
x2_obs ~ normal(x2_true, x2_sd);
b0 ~ normal(0,10);
b1 ~ normal(0,10);
b2 ~ normal(0,10);
y ~ bernoulli_logit(b0 + b1 * x1_true + b2*x2_true);

}

To me, this seems like a fairly straightforward model to use a non-centered re-parameterization:
x_{true} \sim (x_{pmu}, x_{psd}) \leftrightarrow x_{true} = x_{pmu} + x_{observed}^* * x_{psd}, x_{true}^* \sim N(0,1)

However, when running the model I get very odd posteriors for the x,_true_raw values. They look like this (the screengrab has my normal predictor variable name, I changed it to x1 in the posted data for simplicity):

Oddly enough, the rest of the model performs just fine. It recovers the proper distributions for each predictor variable observation and the population mean/sd while also estimating the coefficients the same as the non-centered version (with high effective sample sizes, good rhat and traceplot mixing). Anyone have any advice or know why this happening?

I think x1_true_raw has a bimodal but symmetric distribution because you are not constraining x1_psd to be positive. If you flip the sign of x1_true_raw and x1_psd, their contribution stays the same. You can define the standard deviations as

real <lower=0> x1_psd;
real <lower=0> x1_psd;

It’s also a good idea to put some priors on the _pmu and _psd parameters.

Oh yeah, thanks I totally missed that! I updated the code (following). I’m still getting poor posteriors so I’ll probably just go back to the centered parameterization. I’d be curious why its not improving performance in this case.

data {
int <lower = 1> N; //number of data points
vector[N] x1_obs; //observed values
vector[N] x1_sd; //known sd
vector[N] x2_obs; //observed values
vector[N] x2_sd; //known sd
int<lower=0,upper=1> y[N];
}

parameters {
vector[N] x1_true_raw;
vector[N] x2_true_raw;
real x1_pmu;
real <lower = 0> x2_psd;
real x2_pmu;
real <lower = 0> x2_psd;
real b0;
real <upper = 0> b1;
real <lower = 0> b2;

}

transformed parameters{
vector[N] x1_true;
vector[N] x2_true;
x1_true = x1_pmu + x1_psd * x1_true_raw;
x2_true = x2_pmu + x2_psd * x2_true_raw;

}
model {
x1_true_raw ~ normal(0,1);
x2_true_raw ~ normal(0,1);
x1_pmu ~ normal(0,10);
x1_psd ~ normal(0,1);
x2_pmu ~ normal(0,10);
x2_psd ~ normal(0,1);
x1_obs ~ normal(x1_true, x1_sd);
x2_obs ~ normal(x2_true, x2_sd);
b0 ~ normal(0,25);
b1 ~ normal(0,10);
b2 ~ normal(0,10);
y ~ bernoulli_logit(b0 + b1 * x1_true + b2*x2_true);

}

I am totally not qualified to answer the question but I might be able to get you in the right direction. For the centered versus non-centered parameterization, I always go back to Michael Betancourt’s paper.

The key idea is in Figure 4 I think. In your example, in the centered version there is a strong relation between x1_psd and x1_true. The funnel problem means that when x1_psd is small, all x1_true will be close to x1_mu but they can vary more with larger x1_psd. The non-centered parameterization, breaks that relation but it introduces a new problem. If x1_true is strongly constrained by the data, then there will be a strong relation between x1_true_raw and x1_psd. I suspect that in your case, the observed measurement error x1_sd1 is relatively small and the relation between x1_true and y is sufficiently strong, so that the data is contraining x1_true.

Section III in the paper explains how the correlations (with centered and non-centered) will show up in HMC. In short, it should be less of a problem than with “common MC implementations”. In practical HMC implementations, it’s still a problem and you should be able to see it in the stepsize (smaller is bad) and divergence diagnostics (more is bad) in Stan. So you can check those to see which implementation works best for your problem/data.

See also https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html for a discussion of how to investigate divergences in RStan using a hierarchical model as an example.