Multi-level multivariate normal model

jroon · March 17, 2018, 12:27pm

Hi all.
I am working with multi-variate outcomes, and being very new to STAN I’ve started with the manuals example model in section 9.15. Note I’m sticking with the less efficient version for now as I find it easier to understand. Anyhow, I would like to extend this model to a multi-level model however I’m at a loss on how to go about this. I was hoping someone could help. I have made some toy data to model as an example:

First simulate students clustered in classes with multiple outcomes

library(simstudy)

set.seed(45)

gen.class <- defData(varname = "c0", dist = "normal", formula = 0, variance = 2, id = "idClass")
gen.class <- defData(gen.class, varname = "nStudents", dist = "noZeroPoisson", formula = 20)
dtClass <- genData(8, gen.class)

gen.student <- defDataAdd(varname = "Gender", dist = "binary", formula = 0.5)
gen.student <- defDataAdd(gen.student, varname = "age", dist = "uniform", formula = "9.5; 10.5")
gen.student <- defDataAdd(gen.student, varname = "y1", dist = "normal", 
                          formula = "50 - Gender + c0", variance = 2)
gen.student <- defDataAdd(gen.student, varname = "y2", dist = "normal", 
                          formula = "50 + age + c0", variance = 2)
gen.student <- defDataAdd(gen.student, varname = "y3", dist = "normal", 
                          formula = "50 - (age*Gender) + c0", variance = 2)
students <- genCluster(dtClass, cLevelVar = "idClass", numIndsVar = "nStudents", 
                        level1ID = "idChild")

students <- addColumns(gen.student, students)
setDF(students)
dim(students)

Define STAN multivariate model:

model <- '
data {
    int<lower=1> K;     // num of outcomes
    int<lower=1> J;     // num of explanatory variables
    int<lower=0> N;     // num of obs
    vector[J] x[N];     // x is array of size N containing vectors of J elements
    vector[K] y[N];     // y is array of size N containing vectors of K elements
}
parameters {
    matrix[K, J] beta;     // K x J matrix
    cov_matrix[K] Sigma;    // 
}

model {
    vector[K] mu[N];

    for (n in 1:N){
        mu[n] = beta * x[n];
    }
    y ~ multi_normal(mu, Sigma);
}'

Set-up data for STAN and run model

library(rstan)
stan_dat <- list(J = 3, K = 3,
                 N = dim(students)[1],
                 x = as.matrix(data.frame(x0 = 1,
                                          x1 = students$Gender,
                                          x2 = students$age)),
                 y = students[, c("y1", "y2", "y3")] )

stan1 <- stan(model_code = model, data = stan_dat, chains = 2, cores=2 , iter=1000,
              pars = c("beta"))

##### Examine results
summary(stan1)

res <- summary(stan1)
muc <- rstan::extract(stan1, par="beta[2,2]",  permuted=FALSE, inc_warmup=TRUE)
mdf <- melt(muc)
ggplot(mdf,aes(x=iterations,y=value,color=chains)) + geom_line() + ylab(mdf$parameters[1])

I have made an attempt or two to extend the model but got confused and abandoned ship. (I am used to using indexes in JAGS, however the matrix format of the equation in the STAN example is adding an extra level of confusion for me!). I would appreciate any help folks might be able to offer.

jroon · March 17, 2018, 4:49pm

I seem to have made some progress in implementing partial pooling:

model <- '
data {
    int<lower=1> K;     // num of outcomes
    int<lower=1> J;     // num of explanatory variables
    int<lower=0> N;     // num of obs

    int nCL;            // num of groups
    int CL[N];          // group of each individual

    vector[J] x[N];     // x is array of size N containing vectors of J elements
    vector[K] y[N];     // y is array of size N containing vectors of K elements
}
parameters {
    matrix[K, J] beta[ nCL ];     // K x J matrix of size nCL - num of clusters
    cov_matrix[K] Sigma;    // 
}

transformed parameters {
    vector[K] mu[N];

    for (n in 1:N){
        mu[n] = beta[ CL[n] ] * x[n];
    }
}

model {
    y ~ multi_normal(mu, Sigma);
}'

This model compiles and runs successfully, although provides me with more results parameters than I necessarily want, I’m getting “outcomes x explanatory_vars x groups” count of beta terms now, when what I want is a beta for each outcome x each explanatory var pooled across groups. I’m unsure if I’ve gone wrong or need to add more to the model.

paul.buerkner · March 17, 2018, 5:07pm

You did not specify any (hierachical) priors in your model, and thus your betas won’t be partially pooled. Such a prior is usually most efficient when using the so called “non-centered” parameteritzaion. You can read more about it in the Stan manual.

If you are calling Stan from R, you can also let the brms package generate the Stan code of such a model for you. You can read more about multivariate model in brms by running vignette(“brms_multivariate”).

jroon · March 17, 2018, 6:52pm

Thanks Paul. Yes I suspected I needed to specify priors also. Thanks for the tip about brms - it turns out I had an older version without that vignette and so had missed it - however I’ve updated now!

jroon · April 3, 2018, 11:19pm

So finally got back to this. I think I’ve now specified a mutli-variate linear mixed model:

model <- '
    data {
    int<lower=1> K;     // num of outcomes
    int<lower=1> J;     // num of explanatory variables
    int<lower=0> N;     // num of obs

    int nCL;            // num of groups
    int CL[N];          // group of each individual

    vector[J] x[N];     // x is array of size N containing vectors of J elements
    vector[K] y[N];     // y is array of size N containing vectors of K elements
}
parameters {
    matrix[K, J] beta;     // K x J matrix
    vector[K] alpha[nCL];
    cov_matrix[K] Sigma;
    real a0[nCL];
    real b0[nCL];
    real sigma_cl;
    real sigma_slope;
}

transformed parameters {
    vector[K] mu[N];

    for (n in 1:N){
        mu[n] = alpha[ CL[n] ] + beta * x[n];
    }
}

model {
    // Define Priors
    for (l in 1:nCL){
        alpha[l] ~ normal( a0[l], sigma_cl );
        to_vector(beta) ~ normal(b0[l], sigma_slope);
    }
    a0 ~ normal(0, 1);
    b0 ~ normal(0, 1);
    sigma_cl ~ cauchy(0, 10);
    sigma_slope ~ cauchy(0, 10);

    // Likelihood
    y ~ multi_normal(mu, Sigma);
}'

I remain confused about priors however, both in trems of when to use a weak or strong prior and also regarding the non-centred parameterisation. I think I understand the reasons why this is better, but as often with these things I’m having trouble applying that to my model. Any tips appreciated. Thanks all

Topic		Replies	Views
Modeling Multivariate Multilevel Model in STAN Modeling	4	1484	June 2, 2020
Fitting a multi-level model to a univariate normal Modeling	8	619	June 4, 2020
Learning how to write multivariate outcomes models Modeling	2	1608	February 6, 2018
Specification of multivariate hierarchical priors Modeling specification	1	559	June 9, 2019
Multivariate analysis Modeling rstan , fitting-issues , specification	4	593	April 22, 2021

Multi-level multivariate normal model

Related topics