# Aggregated predictor in a simple linear regression

Hi everybody!

In a simple linear regression, say I have a manifest outcome Y and a predictor X that is aggregated (i.e., the values of X are means of distributions of true parameters with known variances).

Would it be a sound modeling strategy to do the following:

data {
vector[N] y;
vector[N] X_means;
vector<lower=0.0>[N] X_sds;
}

parameters {
vector[N] X_i;
real<lower=0.0> sd_residual;
real beta_0;
real beta_1;
}

model {
X_i ~ normal(X_means, X_sds);
y ~ normal(beta_0 + beta_1 * X_i, sd_residual);
// Priors...
}


I couldnât really figure out which part of the modeling world this belongs to, it is kind of a reverse-latent variable modeling, since we have the means and errors and are interested in the manifest values (that we donât know). On the other hand, itâs not really mixed modeling either, at least I canât manage to make it fit into the format.

I have to say I am a bit unsure if this is even permissable, because if you reparameterize X_i and plug it into the regression equation, it reads:

Y_i = \beta_0 + \beta_1(\bar{X}_i + \sigma_{X_i}X_i) + \varepsilon_i = \beta_0 + \beta_1\bar{X}_i + \beta_1\sigma_{X_i}X_i + \varepsilon_i

with X_i\sim N(0,1). Now how can we disentangle the terms \beta_1\sigma_{X_i}X_i and \varepsilon_i?

If it is actually a sensible model, I would be happy if you could tell me if this has a name and how I can find further information.

Thank you!

1 Like

This is valid (assuming the uncertainties in the X_i are independent and approximately Gaussian), and is often referred to as a âmeasurement error modelâ.

If \sigma_{X_i} is too large, then you wonât get identification. But it should be pretty straightforward to convince yourself that there wonât be identification problems if \sigma_{X_i} is sufficiently small. In the limit that \sigma_{X_i} approaches zero, this is just an ordinary linear regression.