I’m trying to learn how to fit a multivariate outcomes model. I read section 9.15 from the stan manual, and I have a question about how to include different covariates for different outcomes. For example, suppose my two outcomes are y1
and y2
and the underlying data generating process is the following:
Generate some fake data
set.seed(212018)
rbvn <- function (n, m1, s1, m2, s2, rho) {
X1 <- rnorm(n, mu1, s1)
X2 <- rnorm(n, mu2 + (s2/s1) * rho *
(X1 - mu1), sqrt((1 - rho^2)*s2^2))
cbind(X1, X2)
}
N <- 100
mu1 <- 0
s1 <- 1
mu2 <- 0
s2 <- 1
rho <- 0.5
e <- rbvn(N,mu1,s1,mu2,s2,rho)
x1 <- rnorm(N,0,1)
x2 <- rnorm(N,0,1)
x3 <- rnorm(N,0,1)
t <- sample(x = c(0,1), size=N, replace = T)
y1 <- 0.5 + 2.12*x1 + 5.78*x3 + 9.15*t + e[,1]
y2 <- 2 + 7.82*x1 + 4.21*x2 + 3.57*t + e[,2]
Note that y2
is not a function of x3
and y1
is not a function of x2
. If I ignore this fact, I can fit this model using the example code from the manual:
data {
int<lower=1> K;
int<lower=1> J;
int<lower=0> N;
vector[J] x[N];
vector[K] y[N];
}
parameters{
matrix[K,J] beta;
cov_matrix[K] Sigma;
}
model {
vector[K] mu[N];
for (n in 1:N)
mu[n] = beta * x[n];
y ~ multi_normal(mu,Sigma);
}
I’m trying to change the code to incorporate the fact that the two outcomes should have a different set of covariates. I tried this:
data {
int<lower=1> J;
int<lower=0> N;
vector[J] x[N];
vector[1] xtwo[N];
vector[1] xtrhee[N];
vector[2] y[N];
}
parameters{
vector[J] beta_one;
vector[J] beta_two;
vector[1] gamma_one;
vector[1] gamma_two;
cov_matrix[2] Sigma;
}
model {
vector[2] mu[N];
for(i in 1:N){
mu[1,i] = dot_product(x[i], beta_one)+ dot_product(xtrhee[i], gamma_one);
mu[2,i] = dot_product(x[i], beta_two)+ dot_product(xtwo[i], gamma_two);
}
y ~ multi_normal(mu,Sigma);
}
But when I try to fit it I get an error:
library(rstan)
df <- data.frame(x1=x1,x2=x2,x3=x3,t=t,y1=y1,y2=y2)
fml <- as.formula(y1 ~ x1+x2+x3+t)
model_data <- model.matrix(data = df, fml)
outcomes <- data.frame(y1=y1,y2=y2)
stan_dat <- list(J = ncol(model_data[,c(1,2,5)]),
N = 100,
x = as.matrix(model_data[,c(1,2,5)]),
xtwo = as.matrix(df$x2),
xtrhee = as.matrix(df$x3),
y = as.matrix(outcomes))
fit2 <- sampling(model2, data = stan_dat)
print(fit2)
SAMPLING FOR MODEL 'stan-5f8a22861036' NOW (CHAIN 1).
Unrecoverable error evaluating the log probability at the initial value.
Exception: []: accessing element out of range. index 3 out of range; expecting index to be between 1 and 2; index position = 2mu (in 'model5f8a1432a18b_stan_5f8a22861036' at line 22)
[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] " Exception: []: accessing element out of range. index 3 out of range; expecting index to be between 1 and 2; index position = 2mu (in 'model5f8a1432a18b_stan_5f8a22861036' at line 22)"
error occurred during calling the sampler; sampling not done
My guess is that my syntax for mu
is wrong, but i tried a few different things that did not work either.
What is the right way of doing this?
Thanks!