# Prior specification for alternative explanations

I am fitting a model of two simultaneously determined choices x1 and x2 as per the following equations.

``````d1 x1 = b1 + b12 x2 + g1 z + e1
d2 x2 = b2 + b12 x1 + g2 z + e2
with b12^2 < d1 * d2
``````

x1 and x2 are observed data. z is an observed exogenous variable. d’s > 0, b’s, and g’s are variables to be estimated. e1 and e2 are assumed to be multi_normal distributed where the parameter I am interested in is rho the correlation between e1 and e2.

When x1 and x2 are positively correlated there are three possible explanations.

• b12 > 0
• rho > 0
• g1*g2 > 0

The main problem I have is that rho is not as well identified by the data as b12 and g1, g2 (see below) and when I put an equally informative prior on b12, rho, g1, and g2 there is more shrinkage towards 0 for rho. A consequence is that this has an effect on the posterior for b12 and g1 and g2. To make the equations identifiable I set d2 = 1/d1 which forces -1 < b12 < 1 and makes b12 just like rho a correlation coefficient. Studying up on the stan_lm() R2 prior gave me the idea that I might use a similar approach. I am also trying to decompose a correlation in 3 separate correlation coefficients (rho, beta12, and I need to do some extra work on g1*g2).

Q1: Does this approach make sense as modelling/prior specification strategy? At this point, I am less sure that it will deal with the problem of too much shrinkage for rho but the elegance of the R2 prior seems to fit well with my problem.

Or I misunderstand the R2 prior. I tried to simulate from the prior in R with the following function.

``````R2prior <- function(M = .25, K = 3){
# test M and K
u_raw <- rnorm(K, 0, 1)
u <- u_raw / sqrt(sum(u_raw ^ 2))
eta <- (1 - M)/M * K/2
R2 <- rbeta(1, K/2, eta)
rhos = sqrt(R2) * u
return(list(R2 = R2, rhos = rhos))
}
``````

Q2: Does this function make sense? I basically cobbled it together based on the stan_lm() vignette and Stack Overflow.

There is a bit more to the model. First, I rewrite the original equations so that.

``````(d1 d2 * b12 ^ 2) x1 = d2 (b1 + g1 z + e1) + b12 (b2 + g2 z + e2)
(d1 d2 * b12 ^ 2) x1 = d1 (b2 + g2 z + e2) + b12 (b1 + g1 z + e1)
with d1 d2 > b12^2
``````

And set d1 = 1/d2 which gives that b12 < 1.

I simulate data from this generative model as follows.

``````
create_opt_data <- function(obs = 200, reps = 1,
sd_eps = c(0.5, 0.5), d = c(1, 1),
b = c(0, 0), b12 = 0,
g1 = c(0, 0), g2 = c(0, 0),
rho = 0, by = 0, sd_nu = 1,
prior = 1){
Sigma = matrix(c(sd_eps[1] ^ 2, sd_eps[1] * sd_eps[2] * rho,
sd_eps[1] * sd_eps[2] * rho, sd_eps[2] ^ 2),
nrow = 2)
x = matrix(rep(NA, 2 * obs * reps), ncol = 2)
effect = matrix(rep(NA, 2 * obs * reps), ncol = 2)

id <- rep(1:obs, reps)
w = rep(rnorm(obs, 0, 1), reps)
z = rnorm(reps * obs, 0, 1)
eps = MASS::mvrnorm(obs, c(0, 0), Sigma)
eps = apply(eps, 2, rep, reps)

denom = d[1] * d[2] - b12 ^ 2
for (i in 1:2){
effect[,i] = b[i] + g1[i] * z + g2[i] * w + eps[,i]
}

x[,1] = (d[2] * effect[,1] + b12 * effect[, 2]) / denom
x[,2] = (d[1] * effect[,2] + b12 * effect[, 1]) / denom

y = by + effect[, 1] * x[,1] + effect[, 2] * x[,2] + b12 * x[,1] * x[,2] +
.5 * d[1] * (x[,1] ^ 2) + .5 * d[2] * (x[, 2] ^ 2)

return(list(N = obs * reps, Nid = obs,
x1 = x[,1], x2 = x[,2], z = z, id = id, prior = prior, y = y,
b12 = b12, d = d, eps)
)
}

dl = create_opt_data(obs = 100, reps = 1, rho = 0.5, b12 = 0,
g1 = c(-.5, .5), b = c(0.5, 0), sd_eps = c(.5, .5))

``````

The stan model with informative priors is as follows. I know there are potential opportunities to optimise the code but it works pretty well for my purposes at the moment.

``````data{
int<lower=1> N;
vector[N] x1;
vector[N] x2;
vector[N] z;
real prior;
}
transformed data{
vector[N] x1_2;
vector[N] x2_2;
vector[N] x12;
vector[2] x[N];
real<lower=0> scale;
x1_2 = x1 .* x1;
x2_2 = x2 .* x2;
x12 = x1 .* x2;
scale = 1;
for (i in 1:N){
x[i][1] = x1[i];
x[i][2] = x2[i];
}
}
parameters{
vector[2] beta;
vector[2] gamma;
real<lower=0> deltahat;
vector<lower=0>[2] sd_eta;
vector[2] br;
real<lower=0> lambda;
}
transformed parameters{
real<lower=0> denom;
real<lower=0> delta[2];
real<lower=-1, upper=1> beta12hat;
real<lower=-1, upper=1> rho;
real beta12;

beta12hat = 1 - 2 * inv_logit(br[1]);
rho = 1 - 2 * inv_logit(br[2]);

delta[1] = deltahat * scale;
delta[2] = 1/deltahat * scale;

beta12 = scale * beta12hat;
denom = delta[1] * delta[2] - beta12^2;

}

model{
matrix[2, 2] var_x;
matrix[2, 2] L_x;
// This is how the correlation is taken into account.
var_x[1,1] = (delta[2]^2 * sd_eta[1]^2
+ beta12^2 * sd_eta[2]^2
+ 2 * rho * delta[2] * sd_eta[1] * beta12 * sd_eta[2])
/ denom^2;
var_x[2,2] = (delta[1]^2 * sd_eta[2]^2
+ beta12^2 * sd_eta[1]^2
+ 2 * rho * delta[1] * beta12 * sd_eta[1] * sd_eta[2])
/ denom^2;
var_x[1,2] = (beta12 * (delta[2] * sd_eta[1]^2 + delta[1] * sd_eta[2]^2)
+ rho * (beta12^2 + delta[1] * delta[2]) * sd_eta[1] * sd_eta[2])
/ denom^2;
var_x[2,1] = var_x[1,2];
L_x = cholesky_decompose(var_x);

//priors
beta ~ normal(0, prior);
gamma ~ normal(0, prior);
br ~ normal(0, 1.2);
deltahat ~ normal(0, lambda);
// scale ~ lognormal(0, 1);
sd_eta[1] ~ normal(0, lambda);
sd_eta[2] ~ normal(0, 1/lambda);
lambda ~ lognormal(0, 1);

// demand
{
vector[2] mu;
for (i in 1:N){
mu[1] = (delta[2] * (beta[1] + gamma[1] * z[i])
+ beta12 * (beta[2] + gamma[2] * z[i])) / denom;
mu[2] = (delta[1] * (beta[2] + gamma[2] * z[i])
+ beta12 * (beta[1] + gamma[1] * z[i])) / denom;
x[i] ~ multi_normal_cholesky(mu, L_x);
}
}
}
generated quantities{
}

``````

The general idea of expressing your prior beliefs about the error variance relative to the systematic variance is an important one, but the beta prior on the R-squared is an implication of a plain linear model with Gaussian errors assuming the correlation matrix among all the variables has an LKJ distribution. Also, it works because the R-squared is invariant to whether you specify the design matrix as X or as Q = X R^{-1}.

In your case, I think the simultaneous equations, plus having the `d1` and `d2` parameters and the constraint `b12 ^ 2 < d1 * d2` breaks all that math. If you can derive the correlation matrix of `z`, `x1`, and `x2`, it may be possible to derive a prior on the parameters from some prior about that correlation matrix.

I see. That makes sense. I have been able to derive conditional correlations (e.g. cor(x1|z, x2|z)), so it should be possible to derive the correlation matrix, brush up on the LKJ paper and go from there. Too bad it is not just plug and play :-). I am glad I asked before proceeding.

Yes, since cor(z, x1) and cor(z, x2) are typically unrestricted, if you have cor(x1 | z, x2 | z), then you can derive cor(x1, x2) from the partial correlation identity

cor(x1 | z, x2 | z) * sqrt( (1 - cor(z, x1)^2) * (1 - cor(z, x2)^2) ) + cor(z, x1) * cor(z, x2)) = cor(x1, x2)

So, now you have a 3x3 correlation matrix, which could be given an LKJ prior, which basically means cor(z, x1), cor(z, x2), and cor(x1 | z, x2 | z) have particular symmetric beta priors over the (-1,1) interval. And then you need to derive what that implies for coefficients and error variances. If you just had

x2 = b2 / d2 + b12 / d2 * x1 + g2 / d2 * z + e2 / d2

then you would be in `stan_lm()` territory. But the other equation plus the constraint on b12 makes things more complicated.

Thanks. I will report back if I come up with something sensible.