# Non-centered parameterization of measurement error model

Hey all,

I had a quick question about the measurement error model from section 6.1 of the Stan users guide. I’m trying to implement it in a logistic regression setting as follows using a non-centered parametrization for the measurement hierarchy. In my case, I have known standard deviations for each predictor variable, which differs from the constant measurement error assumed in the example.

``````data {
int <lower = 1> N; //number of data points
vector[N] x1_obs; //observed values
vector[N] x1_sd; //known sd
vector[N] x2_obs; //observed values
vector[N] x2_sd; //known sd
int<lower=0,upper=1> y[N];
}

parameters {
vector[N] x1_true_raw;
vector[N] x2_true_raw;
real x1_pmu; // population hyperparameters
real x2_psd;
real x1_pmu;
real x1_psd;
real b0;
real b1;
real b2;

}

transformed parameters{
vector[N] x1_true;
vector[N] x2_true;
x1_true = x1_pmu + x1_psd * x1_true_raw;
x2_true = x2_pmu + x2_psd * x2_true_raw;

}
model {
x1_true_raw ~ normal(0,1);
x2_true_raw ~ normal(0,1);
x1_obs ~ normal(x1_true, x1_sd); // measurement error
x2_obs ~ normal(x2_true, x2_sd);
b0 ~ normal(0,10);
b1 ~ normal(0,10);
b2 ~ normal(0,10);
y ~ bernoulli_logit(b0 + b1 * x1_true + b2*x2_true);

}
``````

To me, this seems like a fairly straightforward model to use a non-centered re-parameterization:
x_{true} \sim (x_{pmu}, x_{psd}) \leftrightarrow x_{true} = x_{pmu} + x_{observed}^* * x_{psd}, x_{true}^* \sim N(0,1)

However, when running the model I get very odd posteriors for the x,_true_raw values. They look like this (the screengrab has my normal predictor variable name, I changed it to x1 in the posted data for simplicity):

Oddly enough, the rest of the model performs just fine. It recovers the proper distributions for each predictor variable observation and the population mean/sd while also estimating the coefficients the same as the non-centered version (with high effective sample sizes, good rhat and traceplot mixing). Anyone have any advice or know why this happening?

I think `x1_true_raw` has a bimodal but symmetric distribution because you are not constraining `x1_psd` to be positive. If you flip the sign of `x1_true_raw` and `x1_psd`, their contribution stays the same. You can define the standard deviations as

``````real <lower=0> x1_psd;
real <lower=0> x1_psd;
``````

It’s also a good idea to put some priors on the `_pmu` and `_psd parameters`.

Oh yeah, thanks I totally missed that! I updated the code (following). I’m still getting poor posteriors so I’ll probably just go back to the centered parameterization. I’d be curious why its not improving performance in this case.

``````data {
int <lower = 1> N; //number of data points
vector[N] x1_obs; //observed values
vector[N] x1_sd; //known sd
vector[N] x2_obs; //observed values
vector[N] x2_sd; //known sd
int<lower=0,upper=1> y[N];
}

parameters {
vector[N] x1_true_raw;
vector[N] x2_true_raw;
real x1_pmu;
real <lower = 0> x2_psd;
real x2_pmu;
real <lower = 0> x2_psd;
real b0;
real <upper = 0> b1;
real <lower = 0> b2;

}

transformed parameters{
vector[N] x1_true;
vector[N] x2_true;
x1_true = x1_pmu + x1_psd * x1_true_raw;
x2_true = x2_pmu + x2_psd * x2_true_raw;

}
model {
x1_true_raw ~ normal(0,1);
x2_true_raw ~ normal(0,1);
x1_pmu ~ normal(0,10);
x1_psd ~ normal(0,1);
x2_pmu ~ normal(0,10);
x2_psd ~ normal(0,1);
x1_obs ~ normal(x1_true, x1_sd);
x2_obs ~ normal(x2_true, x2_sd);
b0 ~ normal(0,25);
b1 ~ normal(0,10);
b2 ~ normal(0,10);
y ~ bernoulli_logit(b0 + b1 * x1_true + b2*x2_true);

}

``````

I am totally not qualified to answer the question but I might be able to get you in the right direction. For the centered versus non-centered parameterization, I always go back to Michael Betancourt’s paper.

The key idea is in Figure 4 I think. In your example, in the centered version there is a strong relation between `x1_psd` and `x1_true`. The funnel problem means that when `x1_psd` is small, all `x1_true` will be close to `x1_mu` but they can vary more with larger `x1_psd`. The non-centered parameterization, breaks that relation but it introduces a new problem. If `x1_true` is strongly constrained by the data, then there will be a strong relation between `x1_true_raw` and `x1_psd`. I suspect that in your case, the observed measurement error `x1_sd1` is relatively small and the relation between `x1_true` and `y` is sufficiently strong, so that the data is contraining `x1_true`.

Section III in the paper explains how the correlations (with centered and non-centered) will show up in HMC. In short, it should be less of a problem than with “common MC implementations”. In practical HMC implementations, it’s still a problem and you should be able to see it in the stepsize (smaller is bad) and divergence diagnostics (more is bad) in Stan. So you can check those to see which implementation works best for your problem/data.

See also https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html for a discussion of how to investigate divergences in RStan using a hierarchical model as an example.