Adjust prior given an omitted variable

Dirk_Nachbar · February 1, 2021, 10:17pm

Assume I have four variables y, x1, x2, x3. I know that x1…3 influence y. I also know that x1 and x2 are positively correlated. For some reason I cannot observe x1 anymore. This means my estimate for c will be too high.

y ~ normal(a + c * x2 + d * x3, sigma)

I can lower the prior (mean) for c, I can also increase the prior for sigma (since x1 is now part of the error) - but both don’t correct the bias in c (very much).

Can I model some latent variable x1hat (function of x2, given assumed correlation) that can reduce c?

mike-lawrence · February 1, 2021, 10:32pm

Makes sense to me, especially if you’re including both data where x1 is present and data where it’s absent. Then it’s just missing data imputation.

Dirk_Nachbar · February 2, 2021, 10:13am

Thanks Mike, do you have a suggestion how to model that, let’s assume corr(x1, x2) = .5

Keep in mind I don’t have x1.

glsnow · February 2, 2021, 7:15pm

I played around with something similar to this a few years back. Here is the stan code I used (this is about 4 years old, so some syntax may need to be updated and it could probably be improved):

data {
  int<lower=0> n;
  vector[n] y;
  vector[n] x;
  real<lower= -1, upper= 1> rho;
  real g;
}
transformed data {
  matrix[2,2] sigx;
  vector[2] mux;
  sigx[1,1] <- variance(x); sigx[1,2] <- sd(x)*rho;
  sigx[2,1] <- sigx[1,2]; sigx[2,2] <- 1.0;
  mux[1] <- mean(x);
  mux[2] <- 0.0;
}
parameters {
  real b0;
  real b1;
  real<lower=0> sig;
  vector[n] u;
}
transformed parameters {
  vector[2] xx[n];
  for(i in 1:n) {
    xx[i,1] <- x[i];
    xx[i,2] <- u[i];
  }
}
model {
  real eta[n];
  for(i in 1:n) {
    eta[i] <- b0 + b1*xx[i][1] + g*xx[i][2];
  }
  xx ~ multi_normal(mux, sigx);
  y ~ normal( eta, sig);
}

x is the observed predictor (x2 in your example, x1 in my thoughts, I don’t have an x3, but you could add it).

rho is the correlation between x1 and x2
g is the slope on the unobserved variable

I believe that to get anything meaningful you need to specify rho and g exactly. If you put a prior on either (or both) or worse, use the default flat prior, then your posterior will have a flat or nearly flat section in a subspace of the posterior around the mode (problems with identifiability) which will lead to not very useful results.

The variable u is just the unobserved x variable (x1 for your case) which I force to have mean 0 and standard deviation 1 (you could change this, but if u is unobserved, then all transformations of u, including the standardization, are also unobserved).

The transformed data section just creates the variance matrix for the relationship between the x variables and the mean vector of the x variables.

The transformed parameters section creates a matrix (xx, actually an array of vectors, which may be able to be improved) of the x variables, the first “column” is the observed x variable (x2) and the second column is the unobserved predictor (x1, u, a parameter in Stan syntax).

The model then uses a multi_normal prior/likelihood on the xx matrix (to bring in the information about the correlation between x1 and x2) and a normal likelihood for y based on the regression model.

The code should really have priors on b0, b1, and sig and the likelihood with y could probably be improved to use matrix multiplication/vectorization.

The simulations that I tried looked promising and I have been meaning to come back and explore this idea some more (my thought is to add something along these lines to the obsSens R package). Please let me know if this works for you, and/or what improvements you make.

Hope this helps get you started.

Dirk_Nachbar · February 2, 2021, 8:54pm

Thanks Greg, this looks great.

I have tested it and it works. I added my x3 to it as well. I am still thinking whether I can put a tight prior on g.

gist.github.com

https://gist.github.com/dirknbr/18fc6c5a1ac632b84fd6505b0680abd0

model4.stan

data {
  int<lower=0> n;
  vector[n] y;
  vector[n] x2;
  vector[n] x3;
  real<lower= -1, upper= 1> rho;
  real g; // slope for u
  vector[2] prior;
}
transformed data {

This file has been truncated. show original

Topic		Replies	Views
Prior specification for alternative explanations Modeling techniques	4	872	May 24, 2017
Data imputation/missing data in a correlation model Modeling	6	1338	December 28, 2018
Conditioning a correlation on another variable Modeling specification , cognitive-science	4	1042	November 18, 2021
Biased estimate from model with latent variables Modeling fitting-issues , specification	6	130	October 4, 2024
Bias from improper priors in regression Modeling fitting-issues , priors	9	692	August 7, 2022

Adjust prior given an omitted variable

Related topics