Prior specification for alternative explanations

I am fitting a model of two simultaneously determined choices x1 and x2 as per the following equations.

d1 x1 = b1 + b12 x2 + g1 z + e1
d2 x2 = b2 + b12 x1 + g2 z + e2
with b12^2 < d1 * d2

x1 and x2 are observed data. z is an observed exogenous variable. d’s > 0, b’s, and g’s are variables to be estimated. e1 and e2 are assumed to be multi_normal distributed where the parameter I am interested in is rho the correlation between e1 and e2.

When x1 and x2 are positively correlated there are three possible explanations.

  • b12 > 0
  • rho > 0
  • g1*g2 > 0

The main problem I have is that rho is not as well identified by the data as b12 and g1, g2 (see below) and when I put an equally informative prior on b12, rho, g1, and g2 there is more shrinkage towards 0 for rho. A consequence is that this has an effect on the posterior for b12 and g1 and g2. To make the equations identifiable I set d2 = 1/d1 which forces -1 < b12 < 1 and makes b12 just like rho a correlation coefficient. Studying up on the stan_lm() R2 prior gave me the idea that I might use a similar approach. I am also trying to decompose a correlation in 3 separate correlation coefficients (rho, beta12, and I need to do some extra work on g1*g2).

Q1: Does this approach make sense as modelling/prior specification strategy? At this point, I am less sure that it will deal with the problem of too much shrinkage for rho but the elegance of the R2 prior seems to fit well with my problem.

Or I misunderstand the R2 prior. I tried to simulate from the prior in R with the following function.

R2prior <- function(M = .25, K = 3){
  # test M and K
  u_raw <- rnorm(K, 0, 1)
  u <- u_raw / sqrt(sum(u_raw ^ 2))
  eta <- (1 - M)/M * K/2
  R2 <- rbeta(1, K/2, eta)
  rhos = sqrt(R2) * u
  return(list(R2 = R2, rhos = rhos))
}

Q2: Does this function make sense? I basically cobbled it together based on the stan_lm() vignette and Stack Overflow.


There is a bit more to the model. First, I rewrite the original equations so that.

(d1 d2 * b12 ^ 2) x1 = d2 (b1 + g1 z + e1) + b12 (b2 + g2 z + e2)
(d1 d2 * b12 ^ 2) x1 = d1 (b2 + g2 z + e2) + b12 (b1 + g1 z + e1)
 with d1 d2 > b12^2

And set d1 = 1/d2 which gives that b12 < 1.

I simulate data from this generative model as follows.


create_opt_data <- function(obs = 200, reps = 1,
                            sd_eps = c(0.5, 0.5), d = c(1, 1),
                            b = c(0, 0), b12 = 0,
                            g1 = c(0, 0), g2 = c(0, 0),
                            rho = 0, by = 0, sd_nu = 1,
                            prior = 1){
  Sigma = matrix(c(sd_eps[1] ^ 2, sd_eps[1] * sd_eps[2] * rho,
                   sd_eps[1] * sd_eps[2] * rho, sd_eps[2] ^ 2),
                 nrow = 2)
  x = matrix(rep(NA, 2 * obs * reps), ncol = 2)
  effect = matrix(rep(NA, 2 * obs * reps), ncol = 2)

  id <- rep(1:obs, reps)
  w = rep(rnorm(obs, 0, 1), reps)
  z = rnorm(reps * obs, 0, 1)
  eps = MASS::mvrnorm(obs, c(0, 0), Sigma)
  eps = apply(eps, 2, rep, reps)


  denom = d[1] * d[2] - b12 ^ 2
  for (i in 1:2){
    effect[,i] = b[i] + g1[i] * z + g2[i] * w + eps[,i]
  }

  x[,1] = (d[2] * effect[,1] + b12 * effect[, 2]) / denom
  x[,2] = (d[1] * effect[,2] + b12 * effect[, 1]) / denom

  y = by + effect[, 1] * x[,1] + effect[, 2] * x[,2] + b12 * x[,1] * x[,2] +
      .5 * d[1] * (x[,1] ^ 2) + .5 * d[2] * (x[, 2] ^ 2)

  return(list(N = obs * reps, Nid = obs,
              x1 = x[,1], x2 = x[,2], z = z, id = id, prior = prior, y = y,
              b12 = b12, d = d, eps)
         )
}

dl = create_opt_data(obs = 100, reps = 1, rho = 0.5, b12 = 0,
                     g1 = c(-.5, .5), b = c(0.5, 0), sd_eps = c(.5, .5))

The stan model with informative priors is as follows. I know there are potential opportunities to optimise the code but it works pretty well for my purposes at the moment.

data{
  int<lower=1> N;
  vector[N] x1;
  vector[N] x2;
  vector[N] z;
  real prior;
}
transformed data{
  vector[N] x1_2;
  vector[N] x2_2;
  vector[N] x12;
  vector[2] x[N];
  real<lower=0> scale;
  x1_2 = x1 .* x1;
  x2_2 = x2 .* x2;
  x12 = x1 .* x2;
  scale = 1;
  for (i in 1:N){
    x[i][1] = x1[i];
    x[i][2] = x2[i];
  }
}
parameters{
  vector[2] beta;
  vector[2] gamma;
  real<lower=0> deltahat;
  vector<lower=0>[2] sd_eta;
  vector[2] br;
  real<lower=0> lambda;
}
transformed parameters{
  real<lower=0> denom;
  real<lower=0> delta[2];
  real<lower=-1, upper=1> beta12hat;
  real<lower=-1, upper=1> rho;
  real beta12;

  beta12hat = 1 - 2 * inv_logit(br[1]);
  rho = 1 - 2 * inv_logit(br[2]);

  delta[1] = deltahat * scale;
  delta[2] = 1/deltahat * scale;

  beta12 = scale * beta12hat;
  denom = delta[1] * delta[2] - beta12^2;

}

model{
  matrix[2, 2] var_x;
  matrix[2, 2] L_x;
  // This is how the correlation is taken into account.
  var_x[1,1] = (delta[2]^2 * sd_eta[1]^2
                + beta12^2 * sd_eta[2]^2
                + 2 * rho * delta[2] * sd_eta[1] * beta12 * sd_eta[2])
  / denom^2;
  var_x[2,2] = (delta[1]^2 * sd_eta[2]^2
                + beta12^2 * sd_eta[1]^2
                + 2 * rho * delta[1] * beta12 * sd_eta[1] * sd_eta[2])
  / denom^2;
  var_x[1,2] = (beta12 * (delta[2] * sd_eta[1]^2 + delta[1] * sd_eta[2]^2)
                + rho * (beta12^2 + delta[1] * delta[2]) * sd_eta[1] * sd_eta[2])
  / denom^2;
  var_x[2,1] = var_x[1,2];
  L_x = cholesky_decompose(var_x);

  //priors
  beta ~ normal(0, prior);
  gamma ~ normal(0, prior);
  br ~ normal(0, 1.2);
  deltahat ~ normal(0, lambda);
  // scale ~ lognormal(0, 1);
  sd_eta[1] ~ normal(0, lambda);
  sd_eta[2] ~ normal(0, 1/lambda);
  lambda ~ lognormal(0, 1);

  // demand
  {
    vector[2] mu;
    for (i in 1:N){
      mu[1] = (delta[2] * (beta[1] + gamma[1] * z[i])
               + beta12 * (beta[2] + gamma[2] * z[i])) / denom;
      mu[2] = (delta[1] * (beta[2] + gamma[2] * z[i])
               + beta12 * (beta[1] + gamma[1] * z[i])) / denom;
      x[i] ~ multi_normal_cholesky(mu, L_x);
    }
  }
}
generated quantities{
}

The general idea of expressing your prior beliefs about the error variance relative to the systematic variance is an important one, but the beta prior on the R-squared is an implication of a plain linear model with Gaussian errors assuming the correlation matrix among all the variables has an LKJ distribution. Also, it works because the R-squared is invariant to whether you specify the design matrix as X or as Q = X R^{-1}.

In your case, I think the simultaneous equations, plus having the d1 and d2 parameters and the constraint b12 ^ 2 < d1 * d2 breaks all that math. If you can derive the correlation matrix of z, x1, and x2, it may be possible to derive a prior on the parameters from some prior about that correlation matrix.

I see. That makes sense. I have been able to derive conditional correlations (e.g. cor(x1|z, x2|z)), so it should be possible to derive the correlation matrix, brush up on the LKJ paper and go from there. Too bad it is not just plug and play :-). I am glad I asked before proceeding.

Yes, since cor(z, x1) and cor(z, x2) are typically unrestricted, if you have cor(x1 | z, x2 | z), then you can derive cor(x1, x2) from the partial correlation identity

cor(x1 | z, x2 | z) * sqrt( (1 - cor(z, x1)^2) * (1 - cor(z, x2)^2) ) + cor(z, x1) * cor(z, x2)) = cor(x1, x2)

So, now you have a 3x3 correlation matrix, which could be given an LKJ prior, which basically means cor(z, x1), cor(z, x2), and cor(x1 | z, x2 | z) have particular symmetric beta priors over the (-1,1) interval. And then you need to derive what that implies for coefficients and error variances. If you just had

x2 = b2 / d2 + b12 / d2 * x1 + g2 / d2 * z + e2 / d2

then you would be in stan_lm() territory. But the other equation plus the constraint on b12 makes things more complicated.

Thanks. I will report back if I come up with something sensible.