We have a model that consists of three variables U, X, and Y that have the dimension of is D \times L, D \times N, and D \times 1 respectively. Our model is as follows:
U := N_{U} \sim N(\mu, \Sigma_{UU})
X := U \alpha + N_{X}
Y := X \beta + U \eta + N_{Y}
Where N_X and N_y have a Normal distribution with mean zero and identity covariance matrix. Hence,
U \sim N(\mu,\Sigma_{UU})
X \sim N(\mu \alpha , \Sigma_{XX}) = N(\mu \alpha, \alpha^T \Sigma_{UU} \alpha + I_{N_X N_X})
Y \sim N(\mu \alpha \beta + \mu \eta, \Sigma_{YY})
Where \Sigma_{YY} is,
\beta^T \Sigma_{XX}\beta + \beta^T \alpha^T \Sigma_{UU} \eta + \eta^T \Sigma_{UU} \alpha \beta + \eta^T \Sigma_{UU} \eta + \eta^T \mu^T \mu \eta + I_{N_Y N_Y}.
For simplicity we assume that \Sigma_{UU} is a identity matrix. The dimensions of \mu, \alpha, \beta and \eta are 1 \times L, L \times N, N \times 1, and L \times 1 respectively.
Our goal is to learn the parameters \mu, \alpha, \beta and \eta. I have created the following Stan model. My problem is that I can’t learn \eta correctly but I can learn the rest of the parameters correctly using the optimizing
method. Here I post my data generation process in R and my Stan model. Any help would be greatly appreciated.
f_u
, f_x
, and f_y
are functions that generate data for u,x, y, \mu, \alpha,\beta and \eta.
f_u <- function(){
mu <- rnorm(L, 0, 10)
Sigma_uu = diag(L)
u <- matrix(0, nrow=D, ncol=L)
for(i in 1:D){
for(j in 1:L){
u[i, j] <- rnorm(1, mu[j], sqrt(Sigma_uu[j,j]))
}
}
return(list(u = u, mu = mu, Sigma_uu = Sigma_uu))
}
sim <- f_u()
mu<- sim$mu
u <- sim$u
Sigma_uu<- sim$Sigma_uu
f_x <- function(u, Sigma_uu){
alpha <- matrix(0, nrow = L, ncol = N)
for(i in 1:L){
for(j in 1:N){
alpha[i, j] <- rnorm(1, 0, 10)
}
}
linear_exp = u %*% alpha
Sigma_xx = t(alpha) %*% Sigma_uu %*% alpha + diag(N)
x <- matrix(0, nrow = D, ncol = N)
for(i in 1:D){
for(j in 1:N){
x[i, j] <- rnorm(1, linear_exp[i,j],sqrt(Sigma_xx[j,j]))
}
}
return(list(x = x, alpha = alpha, Sigma_xx = Sigma_xx))
}
sim_x <- f_x(u,Sigma_uu)
alpha <- sim_x$alpha
x <- sim_x$x
Sigma_xx <- sim_x$Sigma_xx
f_y <- function(x,Sigma_xx, u, Sigma_uu){
beta <- matrix(0, nrow = N, ncol = 1)
eta <- matrix(0, nrow = L, ncol = 1)
for (i in 1:N) {
beta[i,1] <- rnorm(1,0,10)
}
for (i in 1:L) {
eta[i,1] <- rnorm(1,0,10)
}
linear_exp = x %*% beta + u %*% eta
var_yy = t(beta) %*% Sigma_xx %*% beta + t(eta) %*% Sigma_uu %*% eta + 1
std_y = sqrt(var_yy[1,1])
y <- matrix(0, nrow = D, ncol = 1)
for(i in 1:D){
y[i, 1] <- rnorm(1, linear_exp[i,1],std_y)
}
return(list(y = y, beta = beta, eta = eta))
}
sim_y <- f_y(x, Sigma_xx, u, Sigma_uu)
beta<- sim_y$beta
eta <- sim_y$eta
y <- sim_y$y
The Stan model is:
model_str <- "
data {
int L;
int D;
int N;
matrix[D, L] u;
matrix[D, N] x;
matrix[D, 1] y;
matrix[L, L] sigma_uu;
}
parameters {
vector[L] mu;
matrix[L, N] alpha;
matrix[N, 1] beta;
matrix[L, 1] eta;
}
transformed parameters {
matrix[D, N] x_loc;
matrix[D, 1] y_loc;
vector[N] x_scale;
vector[1] y_scale;
x_scale = sqrt(diagonal(alpha'*sigma_uu*alpha) + rep_vector(1,N));
y_scale = sqrt(diagonal(beta'*(alpha'*sigma_uu*alpha + diag_matrix(rep_vector(1, N)))*beta + beta'*alpha'*sigma_uu*eta + eta'*sigma_uu*alpha*beta + eta'*sigma_uu*eta + eta'*(mu'*mu)*eta)+ rep_vector(1,1));
for (i in 1:D){
x_loc[i, ] = u[i, ] * alpha;
y_loc[i, ] = x[i, ] * beta + u[i, ] * eta;
}
}
model {
target += normal_lpdf(mu | 0, 10);
for (j in 1:L){
target += normal_lpdf(alpha[j, ] | 0, 10);
target += normal_lpdf(eta[j, ] | 0, 10);
}
for (j in 1:N){
target += normal_lpdf(beta[j, ] | 0, 10);
}
for (i in 1:D){
target += normal_lpdf(u[i, ] | mu, 1); // likelihood
target += normal_lpdf(x[i, ] | x_loc[i, ], x_scale);
target += normal_lpdf(y[i, ] | y_loc[i, ], y_scale);
}
}
"
I will compile the model as follows:
mod <- stan_model(model_code = model_str)
data_list <- list(u=u, L=5, D=1000, x = x, N =8, y = y, sigma_uu = diag(L))
optim_fit <- optimizing(mod, data=data_list)
Let’s compare the estimates of parameters with their actual values:
optim_fit$par[1:5] #estimated mu
mu #real mu
optim_fit$par[6:45] #estimated alpha
t(alpha) #real alpha
optim_fit$par[46:53] #estimated beta
t(beta) #real gamma
optim_fit$par[54:58]
t(eta)
All the parameters will be learned correctly except for \eta. I can’t figure out why. Can anyone please help me identify the issue?