Multilevel Models with Varying Intercepts and Slopes with Covariance

Dear Stan Users and Experts,

I am trying to estimate a simple linear two-level with varying intercepts and slopes models with a covariance matrix for random effects (u_{0g}, u_{1g}) , or they are correlated.

y_{ig} = \beta_{0g} + x_{1ig}\beta_{1g}+\varepsilon_{ig}
\beta_{0g}=\gamma_{00}+u_{0g}
\beta_{1g}=\gamma_{01}+u_{1g}

data {
  int N; // number of observations
  int P; // number of groups
  int K; // number of predictors + intercept
  int prov[N]; // group vector
  matrix[N, K] x; // matrix of predictors
  real y[N]; // y vector
}

parameters {
  vector[K] betaP[P]; // intercept and slope coefficients by group
  vector<lower=0>[K] sigmaP; // sd for intercept and slope
  vector[K] beta; // intercept and slope hyper-priors
  corr_matrix[K] Omega; // correlation matrix
  real<lower=0> sigma; // population sigma
}

model {
  vector[N] mu;  
  // priors
  beta ~ normal(0, 1);
  Omega ~ lkj_corr(2);
  sigmaP ~ exponential(1);
  sigma ~ exponential(1);
  betaP ~ multi_normal(beta, quad_form_diag(Omega, sigmaP));
  // likelihood
  for(i in 1:N) {
    mu[i] = x[i] * (betaP[prov[i]]); // * is matrix multiplication in this context
  }
  y ~ normal(mu, sigma);
}

I closely followed the codes on the following blog,

which was suggested on Andrew’s blog,
https://statmodeling.stat.columbia.edu/2024/02/27/tutorial-on-varying-intercept-varying-slope-multilevel-models-in-stan-from-will-hipson/

I feel that I only made cosmetic changes to the original codes to suit my model and data naming preference. I estimated the same model using lmer and stan_glmer without any problem. So there shouldn’t be issues with the data structure. But I am getting the following error message,

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: multiply: Columns of rv (2) and Rows of v (7305519684788891252) must match in size (in 'string', line 29, column 4 to column 36)
[1] "Error : Exception: multiply: Columns of rv (2) and Rows of v (7305519684788891252) must match in size (in 'string', line 29, column 4 to column 36)"
[1] "error occurred during calling the sampler; sampling not done"

Anyone knows how to remedy this issue. I suspect it might be related to how some of the data/parameter vector/matrix is declared. Any help would be very much appreciated. What I am trying to do is to present a simple HLM model using the LKJ prior for my students.

Jun Xu, PhD
Professor
Department of Sociology
Data Science & Analytics Program
Ball State University
Web: Jun Xu

Dear all,

Now, I am posting both data and R codes (with Stan codes embedded) for replication purposes. Any help would be much appreciated.

I also just found out that I wasn’t paying attention even to my own work :( I posted a very similar post in the April of 2022, but for a more complex set-up (two-level binary logit with group-level predictors and a covariance matrix for the random effects). With the help from the Stan team and other experts, I was able to figure out the codes (still not with array). I will try to work out an annotated example and post to a webpage like how Will Hipson did, but in a much simpler fashion (only the intercept and slope as outcome with covariance example, not other examples). The current example I am having issues with is different from the one posted in 2022. My question now is why it doesn’t work. Clearly, they are coded somewhat differently. But it’s confusing to me why the simpler one doesn’t work. Any help would be appreciated.
mydta.RData (55.7 KB)
bayesHLM-LKJpriorStanPost20240605.R (2.5 KB)

Hi. I do not understand the model you are trying to implement. Given the error messages I would suggest printing both x[i] and betaP[prov[i]] to the screen, or their respective sizes to see if the dimensions are compatible and yield the required scalar for mu[i]. This can be achieved through print(dims(x[i])) and print(dims(betaP[prov[i]]))

Hi Garren,

Thanks for your suggestion. I did as you suggested, and got the following results:

...
...
...
[1,2][2,1]
[1,2][2,1]
[1,2][2,1]
[1,2][2,1]
[1,2][1937682752,1]

So the last print gives me this huge number (row number, 1937682752), causing the multiplication incompatibility of the two vectors. But what’s causing this? I can’t figure out how I got such a large row number for the last betaP[prov[i]]. Any thoughts?

@junxu2018 Your variable provN is 28, but your province variable has 31 levels. I can’t recreate your exact error code, but try setting your data list as:

dataList = list(
  x = x , 
  y = as.vector( y ) , # BUGS does not treat 1-column mat as vector
  K =  nPredictors, # intercept + slope
  N = nData,
  prov = as.numeric(mydta$province),
  P = max(as.numeric(mydta$province))
)

You got me. I have to apologize to the community members that I should’ve more carefully checked my data. The province variable does run from 1-31, but it only has 28 levels/values, missing 14, 20, and 25. The prov numeric item in the dataList should run from 1-28 to facilitate the Stan loops (without skips). Looks like it’s a confirmation of the statistical motto that 95% about statistical analysis stays in data management.

So I add a new line right below my Stan codes and above R codes for data management,

mydta <- transform(mydta, ID = as.numeric(factor(province)))

Then it all worked. Of course my MCMC parameters were set minimally to get the model going, and they should’ve been set to much larger numbers to get mixed distribution. Thank you all!

Jun

Also need to revise the “prov =…” line in the dataList block,

...
...
prov = mydta$ID

So the dataList block becomes

dataList = list(
  x = x , 
  y = as.vector( y ) , 
  K =  nPredictors, # intercept + slope
  N = nData,
  prov = mydta$ID,
  P = provN
)

Here, I want to ask a related question. A common model setup actually should be like the following,

y_{ig}=\beta_{0g}+x_{1ig}\beta_{1g}++x_{2ig}\beta_{2g}+x_{3ig}\beta_{3g}+x_{4ig}\beta_{4g}+\varepsilon_{ig}
\beta_{0g}=\gamma_{00}+u_{0g}
\beta_{1g}=\gamma_{01}+u_{1g}
\beta_{2g}=\gamma_{02}+u_{2g}
\beta_{3g}=\gamma_{03}+u_{3g}
\beta_{4g}=\gamma_{04}+u_{4g}

Then only \beta_{0g}, \beta_{1g}, \beta_{2g} have correlated error terms; or in other words, these three parameters are correlated, whereas \beta_{3g} and \beta_{4g} vary across groups, but are not correlated with other parameters (\beta_{0g}, \beta_{1g}, \beta_{2g}) or between the two themselves (\beta_{3g}, \beta_{4g}). Another thing I want to do in my codes is to separate out intercept for more flexible control.

Below is my model setup,

data {
  int N; // number of observations
  int G; // number of groups
  int K; // number of total predictors without intercept
  int b1N; // number of predictors with correlation
  int b2N; // number of predictors without correlation
  int prov[N]; // group vector
  matrix[N, b1N] x1; // matrix of predictors with correlation 
  matrix[N, b2N] x2; // matrix of predictors without correlation
  real y[N]; // y vector
}
parameters {
  // Vectors in Stan are column vectors
  real b0g[G]; // intercept coefs by group
  real<lower=0> b0g_sig; // sd for intercept
  real b0; // intercept hyper-prior
  
  vector[b1N] b1g[G]; // slopes correlated with intercept
  vector<lower=0>[b1N] b1g_sig; // sd for these slopes
  vector[b1N] b1; // slope hyper-priors
  
  vector[b2N] b2g[G]; // slopes correlated with intercept
  vector<lower=0>[b2N] b1g_sig; // sd for these slopes
  vector[b2N] b2; // slope hyper-priors  
  
  corr_matrix[b1N+1] Omega; // correlation matrix: +1 for intercept
  real<lower=0> sigma; // population sigma
}
transformed parameters {
  vector[b1N+1] coef_g[G]; // b1 + intercept
  vector[b1N+1] coef; // b1 + intercept
  vector[b1N+1] sig_g; // b1 + intercept sigma
  coef_g <- append_row(b0g, b1g);
  coef <- append_row(b0, b1);
  sig_g <- append_row(b0g_sig, b1g_sig);
}
model {
  vector[N] mu;
  b0 ~ normal(0, 1);
  b1 ~ normal(0, 1);
  b2 ~ normal(0, 1);
  b0g_sig ~ exponential(1);
  b1g_sig ~ exponential(1);
  b2g_sig ~ exponential(1);
  sigma ~ exponential(1);
  Omega ~ lkj_corr(2);  
  coef_g ~ multi_normal(coef, quad_form_diag(Omega, sig_g));
  for(i in 1:N) {   
    mu[i] = b0g[prov[i]] + x1[i] * (b1g[prov[i]]) + x2[i] * (b2g[prov[i]]) ; // * is matrix multiplication in this context
  }
  y ~ normal(mu, sigma);
}

And I got the following message,

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  0

Semantic error in 'string', line 33, column 12 to column 32:

Ill-typed arguments supplied to function 'append_row'. Available signatures: 
(real, vector) => vector
(vector, real) => vector
(vector, vector) => vector
(row_vector, row_vector) => matrix
(row_vector, matrix) => matrix
(matrix, row_vector) => matrix
(matrix, matrix) => matrix
Instead supplied arguments of incompatible type: real[], vector[].

I think it means the following line and (probably next two lines) causing the error,

  coef_g <- append_row(b0g, b1g);

But I don’t see why this line doesn’t work since the append_row function allows combining real with vector. Any thoughts and input would be very much appreciate. I am trying to produce a generalized example.

Jun

As the error message suggests, the datatypes for b0g and b1g, i.e arrays are not accepted as valid datatypes for the append_row function. You will have to use a loop for the assignment of coef_g

You can also use the to_vector function here, so your arguments to append_row match.

Thank you both for your suggestions,

I tried

to_vector(coef_g) = append_row(b0g, b1g);

Then

coef_g = append_row(to_vector(b0g), to_vector(b1g));

But both failed. Probably I don’t spend enough time on the Stan user’s guide, but it’s puzzling to me, since coef_g is declared as a multidimensional vector. b_{0g} is of type real, and b_{1g} is of type vector. Also both can be viewed as arrays. I think I must’ve missed something here. I don’t quite get at how to set up the loop. Any pointer will be greatly appreciated.

Jun

What I am trying to do is to combine real b0g[G] and vector[b1N] b1g[G] to form a matrix with correlated intercept and slopes (that would involve a covariance matrix with LKJ prior, while leaving out (slopes in the)vector[b2N] b2g[G] matrix to have variances, but not the covariance. Any quick pointer would be much appreciated.

Jun

As I mentioned here

The line

Will have to change to a loop:

for (i in 1:G) {
  coef_g[i] = append_row(b0g[i], b1g[i]);
}

Stan does not handle matrix and array concatenation in the same manner. If you defined b0g as a vector and b1g as a matrix you would be able to just use append_row. Since you have arrays of mixed datatypes (i.e. an array of reals and an array of vectors) you can not perform this in a single line.

Dear Garren,

Thanks a lot. The codes ran through. I am checking the results and making sure the Stan codes are estimating what they are supposed to. Once I feel that I get solid results, I will post codes here and an annotated webpage to my website to share with the community. Always annotate :) I will plunge into it for the next few days. Thanks a lot to community members.

Jun

I just created a webpage for the two examples of Stan codes that we discussed previously:

  1. a two-level linear regression model, with a group-specific intercept and one/multiple group-specific predictors, with all the intercept and slopes are correlated.
  2. a two-level linear regression model, with three sets/types of parameters, one group-specific intercept, one set of group-specific predictors that are correlated with the intercept and amongst themselves, and another set of group-specific predictors (that vary across groups) that are not correlated with other parameters or amongst themselves.

My description of the models was cursorily done (but the Stan codes were carefully checked) and may have many rough edges. I will update these two pages as errors are discovered or reported over time, and I will post more difficult (for me) examples.

Thank you all very much for all your help!

Jun