Divergent transitions in a multivariate multilevel regression model

Dear Stan users/developers.

I am trying to write a code for a multivariate, multilevel regression. To start, I am simulating data from only two dependent (y1, y2) and two independent variables (x1, x2) for 40 participants, with 20 measurements each I attach herewith the Stan code and the code for simulation of data. I’m having divergent transitions, although I’ve used multinormal_cholesky for the multivariate outcome variable. Note also that to simplify things I assume that the coefficients for each outcome are independent, such that multivariate normal distributions are not used for level-1 parameters. Any help with eliminating the divergent transitions would be appreciated!

Thanks,
Isaac

data{
   int<lower=0> Nvars; //2 independent and 2 dependent variables in our simulation
   int<lower=0> N; //Total number of data points
   int<lower=0> Nsubj; //Number of subjects
   int subj[N];
   matrix[N,Nvars] x;
   vector[Nvars] y [N];
}

parameters{
    cholesky_factor_corr[Nvars] L_Omega; //For the correlations between outcome variables 
    vector<lower=0>[Nvars] L_sigma; //Residual SD of outcome variables
    matrix[Nvars,Nsubj] beta0; //Intercepts at subject level, separately for each outcome variable
    matrix[Nvars,Nsubj] beta [Nvars]; //Slopes at the subject level, for each outcome and predictor variables (i.e. same variable, with the former being at time t and the latter at time t-1)
   vector[Nvars] Beta0; //Level-two intercepts
   vector[Nvars] Beta [Nvars]; //Level-two slopes
   vector<lower=0>[Nvars] Tau0; //Random effect of intercepts
   matrix<lower=0> [Nvars,Nvars] Tau; //Random effect of slopes
}

transformed parameters{
    vector[Nvars] mu[N];
    matrix [Nvars,Nvars] Sigma;

    Sigma = diag_pre_multiply(L_sigma, L_Omega);

for (i in 1:N){
  for (v in 1:Nvars){
    mu[i,v]=beta0[v,subj[i]]+dot_product(to_vector(beta[1:Nvars,v,subj[i]]),x[i,1:Nvars]);
  }
}

}

model{
    //Priors
    L_Omega~lkj_corr_cholesky(1);
    L_sigma~cauchy(0,5);
    Tau0~cauchy(0,5);
    to_vector(Tau)~cauchy(0,5);
    to_vector(beta0)~normal(0,5);
    for (v in 1:Nvars){to_vector(beta[v,,])~normal(0,5);}


for (i in 1:N){
    y[i,1:Nvars]~multi_normal_cholesky(mu[i,1:Nvars],Sigma);
}

for (s in 1:Nsubj){
for (v in 1:Nvars){
    beta0[v,s]~normal(Beta0[v],Tau0[v]);
    beta[1:Nvars,v,s]~normal(Beta[1:Nvars,v],Tau[1:Nvars,v]);}
}


}
'

And here is the code for the simulations:


sigma=matrix(c(1,0.5,0.3,0.4,
               0.5,1,0.6,0.7,
               0.3,0.6,1,0,
               0.4,0.7,0,1),nrow=4)
colnames(sigma)<-c("y1","y2","x1","x2")
rownames(sigma)<-c("y1","y2","x1","x2")

library(MASS)
Data=mvrnorm(n = 20,mu = c(10,20,0,0),Sigma = sigma,empirical = F)
for (i in 2:40){
  Data<-rbind.data.frame(Data,mvrnorm(n = 20,mu = c(10,20,0,0),Sigma = sigma,empirical = F))
}

Data$subj<-rep(1:40,each=20)

#Data<-Data[sample(rownames(Data),900),]

y=Data[,1:2]
x=Data[,3:4]
subj=Data$subj


datalist=list(y=y,
              x=x,
              N=nrow(x),
              Nvars=2,
              Nsubj=max(subj),
              subj=subj)

1 Like

Hi @Isaac_Fradkin. I’m happy to see your post and model because I tried to build something similar before and I got lost in tracking the incides - so thank you for figuring that out!

I have run the model and I got 315 divergences. Something I noticed however - you have not set priors for Beta0 and Beta, so perhaps the reasons for the divergences are because the model is trying to explore the parameter space of these two variables out to infinity? So I added priors for these two variables as follows:

    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ cauchy(0, 5);
    Tau0 ~ cauchy(0, 5);
    to_vector(Tau) ~ cauchy(0, 5);
    to_vector(beta0) ~ normal(0, 5);
    Beta0 ~ normal(0, 5);

    for (v in 1:Nvars) {
        to_vector(beta[v, , ]) ~ normal(0, 5);
        Beta[v] ~ normal(0, 5);
    }

I re-ran the model and still had some divergences, but this time only 138.

So - I’m not sure where to look next (other than trying to tighten some of the priors perhaps).
I was wondering however - do you need both random intercepts and random slopes at the individual level. Perhaps random intercepts is enough and the model would be a bit simpler to optimise?

Perhaps, you might want to look into non-centred parameterisation for specifying the model (Stan manual section 28.6). Alas, this model is too complex for me to specify this way as I dont’ fully understand the non-centered parameterisation well enough yet, but maybe you are able? Perhaps others with more expertise can advise here.

Edit: Oh by the way one thing confused me - are you using Nvars to represent both number of x vars and number of y vars ? I might be clearer/easier to expand if you use a different count variable for x’s and y’s

Hi @jroon

Thanks for your interest in my model. Actually your advice of using non-central parmeterization suddenly made sense to me in a way I haven’t thought of before. So what I did is to have non-central parameterization for the level-1 intercepts and slopes, and there are no more divergent transitions for now :-). Now just have to fit it real rather than simulated data, which could be more complicated. I attach the code herewith. Please note that the model is meant for a multivariate autoregressive model, where the number of dv’s and iv’s is equal (Nvars in the model), so if you case is different - you’ll need to adjust only this part. Also - most likely the code can be more efficient by introducing more vectorization. But I’ll leave that for later.


modelstring4='
data{
int<lower=0> Nvars;
int<lower=0> N;
int<lower=0> Nsubj;
int subj[N];
matrix[N,Nvars] x;
vector[Nvars] y [N];

}
parameters{
cholesky_factor_corr[Nvars] L_Omega;
vector<lower=0>[Nvars] L_sigma;
matrix[Nvars,Nsubj] Zbeta0;
matrix[Nvars,Nsubj] Zbeta [Nvars];
vector[Nvars] Beta0;
vector[Nvars] Beta [Nvars];
vector<lower=0>[Nvars] Tau0;
//vector<lower=0>[Nvars] Tau [Nvars];
matrix<lower=0> [Nvars,Nvars] Tau;
}

transformed parameters{
vector[Nvars] mu[N];
matrix [Nvars,Nvars] L_Sigma;
matrix[Nvars,Nsubj] beta0;
matrix[Nvars,Nsubj] beta [Nvars];


L_Sigma = diag_pre_multiply(L_sigma, L_Omega);


for (s in 1:Nsubj){
  for (dv in 1:Nvars){
    beta0[dv,s]=Zbeta0[dv,s]*Tau0[dv]+Beta0[dv];
    for (iv in 1:Nvars){
      beta[dv,iv,s]=Zbeta[dv,iv,s]*Tau[dv,iv]+Beta[dv,iv];
    }
  }
}

for (i in 1:N){
  for (dv in 1:Nvars){
    mu[i,dv]=beta0[dv,subj[i]]+dot_product(to_vector(beta[dv,1:Nvars,subj[i]]),x[i,1:Nvars]);
  }
}


}

model{

//Priors
L_Omega~lkj_corr_cholesky(1);
L_sigma~cauchy(0,5);
Tau0~cauchy(0,5);
to_vector(Tau)~cauchy(0,5);
//to_vector(Beta0)~normal(0,5);
//for (v in 1:Nvars){to_vector(Beta[v,])~normal(0,5);}


for (i in 1:N){
y[i,1:Nvars]~multi_normal_cholesky(mu[i,1:Nvars],L_Sigma);
}

for (s in 1:Nsubj){
  for (dv in 1:Nvars){
    Zbeta0[dv,s]~normal(0,1);
    Zbeta[dv,1:Nvars,s]~normal(0,1);}
  }
}

generated quantities{
  matrix[Nvars,Nvars] Omega;
  matrix[Nvars,Nvars] Sigma;
  Omega <- multiply_lower_tri_self_transpose(L_Omega);
  Sigma <- quad_form_diag(L_Omega, L_sigma);
}
'

1 Like

Great - glad I was of some help!
Thanks for sharing the code - potentially useful for me - much obliged. And yes I take you point about Nvars and the model being autoregressive - I forgot that part when commenting!
Good look with application to your real data.

Hi again @Isaac_Fradkin.

I ran the non-centered model on your test data and while there were no divergences, when I plotted the traceplot some of them were simply flatline:

traceplot(fit, pars = c(“L_Omega”))

traceplot(fit, pars = c(“L_Sigma”))

I don’t know why this is!

L_Omega is a cholesky factor for a correlation matrix which means all it’s upper diagonal elements are fixed at 0 (L_Omega[i,j] = 0 for i < j ) and the first diagonal element is fixed at 1.

These traceplots are perfectly normal.

1 Like

Insert faceplam gif here.

of course that makes perfect sense I should have spotted that! Thanks

1 Like

@Isaac_Fradkin - I’ve been trying to generalise the model to cases where number of x’s and y’s are different, but I’m getting tied in knots with the dimensions of the vectors/matrices. I was hoping you might be able to help me straighten things out please?

I modified the gen_dat.r file to generate a suitable test dataset. gen_data_y2_x3.R (792 Bytes)

Ok and below is my model. When I run this I get the error:

[1] “Error in sampler$call_sampler(args_list[[i]]) : "
[2] " Exception: []: accessing element out of range. index 3 out of range; expecting index to be between 1 and 2; index position = 2beta (in ‘model16dc521883c9_multi_out_noncent_expanded’ at line 34)”

I’ve tried some variations but I’m mostly just becoming more confused!

Model:

data{
    int<lower=0> NvarsX; // num independent variables
    int<lower=0> NvarsY; // num dependent variables
    int<lower=0> N;     //Total number of rows
    int<lower=0> Nsubj; //Number of subjects
    int subj[N];
    matrix[N, NvarsX] x;
    vector[NvarsY] y [N];
}

parameters{
    cholesky_factor_corr[NvarsY] L_Omega; //For the correlations between outcome variables
    vector<lower=0>[NvarsY] L_sigma; //Residual SD of outcome variables
    matrix[NvarsY, Nsubj] Zbeta0; //Intercepts at subject level, separately for each outcome variable
    matrix[NvarsY, Nsubj] Zbeta [NvarsX]; //Slopes at the subject level
    vector[NvarsY] Beta0;        //Level-two intercepts
    vector[NvarsY] Beta [NvarsX]; //Level-two slopes
    vector<lower=0> [NvarsY] Tau0; //Random effect of intercepts
    matrix<lower=0> [NvarsY, NvarsX] Tau; //Random effect of slopes
}

transformed parameters{
    vector[NvarsY] mu[N];
    matrix[NvarsY, NvarsY] L_Sigma;
    matrix[NvarsY, Nsubj] beta0;
    matrix[NvarsY, Nsubj] beta [NvarsX];

    L_Sigma = diag_pre_multiply(L_sigma, L_Omega);

    for (s in 1:Nsubj){
      for (dv in 1:NvarsY){
        beta0[dv, s] = Zbeta0[dv, s] * Tau0[dv] + Beta0[dv];
        for (iv in 1:NvarsX){
            beta[dv, iv, s]= Zbeta[dv, iv, s] * Tau[dv, iv] + Beta[dv, iv];
        }
      }
    }

    for (i in 1:N){
        for (dv in 1:NvarsY){
            mu[i, dv] = beta0[dv, subj[i]] + dot_product(to_vector(beta[dv, 1:NvarsX, subj[i]]), x[i, 1:NvarsX]);
        }
    }
}

model{
    //Priors
    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ cauchy(0,5);
    Tau0 ~ cauchy(0,5);
    to_vector(Tau) ~ cauchy(0,5);
    //to_vector(Beta0)~normal(0,5);
    //for (v in 1:Nvars){to_vector(Beta[v,])~normal(0,5);}

    for (i in 1:N){
        y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsX], L_Sigma);
    }

    for (s in 1:Nsubj){
        for (dv in 1:NvarsY){
            Zbeta0[dv,s] ~ normal(0,1);
            Zbeta[dv, 1:NvarsX,s] ~ normal(0,1);
        }
    }
}

generated quantities{
    matrix[NvarsY, NvarsX] Omega;
    matrix[NvarsY, NvarsX] Sigma;
    Omega = multiply_lower_tri_self_transpose(L_Omega);
    Sigma = quad_form_diag(L_Omega, L_sigma);
}

beta[dv, iv, s]= Zbeta[dv, iv, s] * Tau[dv, iv] + Beta[dv, iv];

The index got messed up. beta[dv, iv, s] should be beta[iv, dv, s] and Beta … same.

1 Like

I tried that didn’t help. Also those indices are from @Isaac_Fradkin 's code - I’d say its more likely I mis-defined a vector or matrix but I can’t find it

Dear @Jroon.

I would suggest replacing these:

matrix[NvarsY, Nsubj] Zbeta [NvarsX];

vector[NvarsY] Beta [NvarsX];

matrix[NvarsY, Nsubj] beta [NvarsX];

with

matrix[NvarsX, Nsubj] Zbeta [NvarsY];

vector[NvarsX] Beta [NvarsY];

matrix[NvarsX, Nsubj] beta [NvarsY];

(because the first index should be the dv).

And this:

y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsX], L_Sigma);

with:

y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsY], L_Sigma);

(because we have a mu for each dv not for each iv).

1 Like

Thank you @Isaac_Fradkin. I think I was mixed up how the indexing work when you have matrix[a, b] J[c]

That solved the error I was having but I had another error in the generated quantities that I figured out myself. This version of the model ran without complaint:
multi_out_noncent_expanded2.stan (2.1 KB)

Thank you for your help with that. One last thing that is bothering me is that the recovered parameters don’t seem to match the simulated data matrix which was:

    y1  y2  x1  x2  x3
y1 1.0 0.5 0.3 0.4 0.2
y2 0.5 1.0 0.6 0.7 0.3
x1 0.3 0.6 1.0 0.0 0.1
x2 0.4 0.7 0.5 1.0 0.2
x3 0.2 0.3 0.1 0.2 1.0

Looking at Betas from the model I’m seeing:

format( colMeans(params$Beta[,1,]), digits=2)
[1] “0.14” “0.29” “0.12”
format( colMeans(params$Beta[,2,]), digits=2)
[1] “0.34” “0.51” “0.14”

Any ideas?

Hi.

First, it seems from your results that the second indicator of Beta is the dv, whereas I think it should be the iv, no?
Second, what was the sample size of the simulated data?

Hi. So my own difficulty with this and the reason I couldn’t make such models myself is because I get tied up in tracking the dimensions and indicators.
But if I look at the class and dimensions of Beta I see:

class(params$Beta)
[1] “array”
dim(params$Beta)
[1] 4000 2 3

So I reasoned that 4000 implies dimensions x = 1 row per posterior sample, y = 2 => dependent vars, z = 3 => independent vars. Am I wrong ?

Simulated data was same as yours just with an extra x variable - 40 subjects with 20 repeated measures each

Edit: Sorry I forgot to say how I got params:

params = rstan::extract(fit)

No no. Its my mistake. The first indicator is indeed for the MCMC samples, so it seems right. I would try increasing the sample size to see if it’s just sampling error. You can also add an empirical=T for the mvrnorm in R to remove sampling errors altogether and check then. Then I would test to see whether the parameters have a symmetric distribution. Maybe the median would work better here?

So I set empirical = T, and then increased the iterations to 4000 and I’m using the median:

format( sapply(1:3, function(x) median(params$Beta[, 1, x]) ), digits=2)
[1] “0.19” “0.28” “0.14”
format( sapply(1:3, function(x) median(params$Beta[, 2, x]) ), digits=2)
[1] “0.31” “0.51” “0.15”

Getting the same answers. Could it be because of the non-centered parameterisation somehow? If we are transforming beta does that have a knock on effect on Beta?

I see.

I think your problem is that you didn’t set the correlations beween x1, x2 and x3 to be zero. This way, the coefficients for each x (for each y) do not represent the correlations, but rather the relationship when controlling for the other x’s (e.g., semi-partial correlations).

1 Like

Once again - I should have spotted that. That solved it.
Many thanks!

Edit: @Isaac_Fradkin - I’ve now run it on real data with which I’ve previously run similar JAGS models, and the results are making sense!