So I have two normal distributions, y_1 and y_2. I want to figure out their “average” distribution. I figure the best way to do that isn’t to average all the data (or the parameters), but to perform partial pooling in a multi-level model and figure out the distribution of their hyperparameters.
I know that sound a bit wacky, but it’s for a larger project. I need to figure out how to code this up first.
y_1 <- rnorm(30, 10, 2)
y_2 <- rnorm(25, 20, 3)
N_1 = length(y_1)
N_2 = length(y_2)
G = 2
ret <- stanc(file = "mlmnormal.stan")
ret_sm <- stan_model(stanc_ret = ret)
fit <- sampling(ret_sm, warmup=100, iter=600, seed=1, data=list(y_1=y_1,
y_2=y_2,
N_1=length(y_1),
N_2=length(y_2),
G = G))
Below is my model “mlmnormal.stan”
data {
int N_1;
int N_2;
vector[N_1] y_1;
vector[N_2] y_2;
int G; //number of groups
}
parameters {
real mu[G];
real<lower = 0> sigma[G];
//hyperparameters
real mu_h;
real<lower = 0>sigma_h;
}
model {
for (i in 1:G) {
mu[i] ~ normal(mu_h, sigma_h);
y_1 ~ normal(mu[1], sigma[1]);
y_2 ~ normal(mu[2], sigma[2]);
}
//hyperparameters
mu_h ~ normal(0, 1);
sigma_h ~ cauchy(0, 1);
}
Things are starting to look reasonable. The variance for the 2nd distribution is wider, but the mean for the 2nd distribution is smaller (it should be larger).
I’ve updated the code as I’m working through the problem. Any feedback is appreciated.