I am fitting a model of two simultaneously determined choices x1 and x2 as per the following equations.
d1 x1 = b1 + b12 x2 + g1 z + e1
d2 x2 = b2 + b12 x1 + g2 z + e2
with b12^2 < d1 * d2
x1 and x2 are observed data. z is an observed exogenous variable. d’s > 0, b’s, and g’s are variables to be estimated. e1 and e2 are assumed to be multi_normal distributed where the parameter I am interested in is rho the correlation between e1 and e2.
When x1 and x2 are positively correlated there are three possible explanations.
- b12 > 0
- rho > 0
- g1*g2 > 0
The main problem I have is that rho is not as well identified by the data as b12 and g1, g2 (see below) and when I put an equally informative prior on b12, rho, g1, and g2 there is more shrinkage towards 0 for rho. A consequence is that this has an effect on the posterior for b12 and g1 and g2. To make the equations identifiable I set d2 = 1/d1 which forces -1 < b12 < 1 and makes b12 just like rho a correlation coefficient. Studying up on the stan_lm() R2 prior gave me the idea that I might use a similar approach. I am also trying to decompose a correlation in 3 separate correlation coefficients (rho, beta12, and I need to do some extra work on g1*g2).
Q1: Does this approach make sense as modelling/prior specification strategy? At this point, I am less sure that it will deal with the problem of too much shrinkage for rho but the elegance of the R2 prior seems to fit well with my problem.
Or I misunderstand the R2 prior. I tried to simulate from the prior in R with the following function.
R2prior <- function(M = .25, K = 3){
# test M and K
u_raw <- rnorm(K, 0, 1)
u <- u_raw / sqrt(sum(u_raw ^ 2))
eta <- (1 - M)/M * K/2
R2 <- rbeta(1, K/2, eta)
rhos = sqrt(R2) * u
return(list(R2 = R2, rhos = rhos))
}
Q2: Does this function make sense? I basically cobbled it together based on the stan_lm() vignette and Stack Overflow.
There is a bit more to the model. First, I rewrite the original equations so that.
(d1 d2 * b12 ^ 2) x1 = d2 (b1 + g1 z + e1) + b12 (b2 + g2 z + e2)
(d1 d2 * b12 ^ 2) x1 = d1 (b2 + g2 z + e2) + b12 (b1 + g1 z + e1)
with d1 d2 > b12^2
And set d1 = 1/d2 which gives that b12 < 1.
I simulate data from this generative model as follows.
create_opt_data <- function(obs = 200, reps = 1,
sd_eps = c(0.5, 0.5), d = c(1, 1),
b = c(0, 0), b12 = 0,
g1 = c(0, 0), g2 = c(0, 0),
rho = 0, by = 0, sd_nu = 1,
prior = 1){
Sigma = matrix(c(sd_eps[1] ^ 2, sd_eps[1] * sd_eps[2] * rho,
sd_eps[1] * sd_eps[2] * rho, sd_eps[2] ^ 2),
nrow = 2)
x = matrix(rep(NA, 2 * obs * reps), ncol = 2)
effect = matrix(rep(NA, 2 * obs * reps), ncol = 2)
id <- rep(1:obs, reps)
w = rep(rnorm(obs, 0, 1), reps)
z = rnorm(reps * obs, 0, 1)
eps = MASS::mvrnorm(obs, c(0, 0), Sigma)
eps = apply(eps, 2, rep, reps)
denom = d[1] * d[2] - b12 ^ 2
for (i in 1:2){
effect[,i] = b[i] + g1[i] * z + g2[i] * w + eps[,i]
}
x[,1] = (d[2] * effect[,1] + b12 * effect[, 2]) / denom
x[,2] = (d[1] * effect[,2] + b12 * effect[, 1]) / denom
y = by + effect[, 1] * x[,1] + effect[, 2] * x[,2] + b12 * x[,1] * x[,2] +
.5 * d[1] * (x[,1] ^ 2) + .5 * d[2] * (x[, 2] ^ 2)
return(list(N = obs * reps, Nid = obs,
x1 = x[,1], x2 = x[,2], z = z, id = id, prior = prior, y = y,
b12 = b12, d = d, eps)
)
}
dl = create_opt_data(obs = 100, reps = 1, rho = 0.5, b12 = 0,
g1 = c(-.5, .5), b = c(0.5, 0), sd_eps = c(.5, .5))
The stan model with informative priors is as follows. I know there are potential opportunities to optimise the code but it works pretty well for my purposes at the moment.
data{
int<lower=1> N;
vector[N] x1;
vector[N] x2;
vector[N] z;
real prior;
}
transformed data{
vector[N] x1_2;
vector[N] x2_2;
vector[N] x12;
vector[2] x[N];
real<lower=0> scale;
x1_2 = x1 .* x1;
x2_2 = x2 .* x2;
x12 = x1 .* x2;
scale = 1;
for (i in 1:N){
x[i][1] = x1[i];
x[i][2] = x2[i];
}
}
parameters{
vector[2] beta;
vector[2] gamma;
real<lower=0> deltahat;
vector<lower=0>[2] sd_eta;
vector[2] br;
real<lower=0> lambda;
}
transformed parameters{
real<lower=0> denom;
real<lower=0> delta[2];
real<lower=-1, upper=1> beta12hat;
real<lower=-1, upper=1> rho;
real beta12;
beta12hat = 1 - 2 * inv_logit(br[1]);
rho = 1 - 2 * inv_logit(br[2]);
delta[1] = deltahat * scale;
delta[2] = 1/deltahat * scale;
beta12 = scale * beta12hat;
denom = delta[1] * delta[2] - beta12^2;
}
model{
matrix[2, 2] var_x;
matrix[2, 2] L_x;
// This is how the correlation is taken into account.
var_x[1,1] = (delta[2]^2 * sd_eta[1]^2
+ beta12^2 * sd_eta[2]^2
+ 2 * rho * delta[2] * sd_eta[1] * beta12 * sd_eta[2])
/ denom^2;
var_x[2,2] = (delta[1]^2 * sd_eta[2]^2
+ beta12^2 * sd_eta[1]^2
+ 2 * rho * delta[1] * beta12 * sd_eta[1] * sd_eta[2])
/ denom^2;
var_x[1,2] = (beta12 * (delta[2] * sd_eta[1]^2 + delta[1] * sd_eta[2]^2)
+ rho * (beta12^2 + delta[1] * delta[2]) * sd_eta[1] * sd_eta[2])
/ denom^2;
var_x[2,1] = var_x[1,2];
L_x = cholesky_decompose(var_x);
//priors
beta ~ normal(0, prior);
gamma ~ normal(0, prior);
br ~ normal(0, 1.2);
deltahat ~ normal(0, lambda);
// scale ~ lognormal(0, 1);
sd_eta[1] ~ normal(0, lambda);
sd_eta[2] ~ normal(0, 1/lambda);
lambda ~ lognormal(0, 1);
// demand
{
vector[2] mu;
for (i in 1:N){
mu[1] = (delta[2] * (beta[1] + gamma[1] * z[i])
+ beta12 * (beta[2] + gamma[2] * z[i])) / denom;
mu[2] = (delta[1] * (beta[2] + gamma[2] * z[i])
+ beta12 * (beta[1] + gamma[1] * z[i])) / denom;
x[i] ~ multi_normal_cholesky(mu, L_x);
}
}
}
generated quantities{
}