Improving efficiency in regression model with varying intercepts (vectorization)

Alexandra · January 23, 2021, 11:14am

Hello everyone,

I am trying to optimize my model code for a (multilevel) regression with vaying intercepts.
I try my code on some simulated test data, containing a matrix of predictors (test_matrix) as well as a group variable with four levels (test_brands).

test_matrix <- cbind(rnorm(1000, 4, 7),
                     rnorm(1000, 0, 1))

test_vector <- c(2,4)

test_brands <- data.frame(brand = rep(1:4, 250))
test_dummies <- test_brands %>%
  mutate(brand_1 = as.numeric(ifelse(test_brands == 1, 1, 0)),
         brand_2 = as.numeric(ifelse(test_brands == 2, 1, 0)),
         brand_3 = as.numeric(ifelse(test_brands == 3, 1, 0)),
         brand_4 = as.numeric(ifelse(test_brands == 4, 1, 0))) %>%
  dplyr::select(-brand) %>%
  as.matrix()

test_vector_dummies <- c(1,2,3,4)


test_outcome <- 100 +  test_matrix %*% test_vector + test_dummies %*% test_vector_dummies + rnorm(1000, 0, 1)

test_data <- data.frame(test_outcome,
                        test_matrix,
                        test_brands)

I am able to estimate the coefficients (test_vector) for test_matrix with the following Bayesian model, which I based on descriptions in the rstan manual:

data {
  int<lower=0> N;  // number of observations
  int<lower=0> J;  // number of brands
  int<lower=0> K;  // number of iv's (columns in the predictors matrix)
  vector[N] y;  //  outcome variable
  row_vector[K] x[N];  // matrix (array) of predictors
  int<lower=1,upper=J> brand[N]; // brands for N observations
}

parameters {
  real alpha[J];
  vector[K] beta;
  real<lower=0> sigma;
}

model {
    for(i in 1:N) {  
  y[i] ~ normal(alpha[brand[i]] + x[i] * beta, sigma);
  }
}

With this model I do get the desired results, however, I am wondering, if there is some possibility to make my code more efficient (e.g. vectorization of the model, …) as I ultimately want to apply this model to rather big datasets. I didn’t find any possibility for vectorization here and I’m not even ure if there is any (because I have only varying intercepts and not varying slopes) … but at least none of my versions worked. However, I am certainly not an expert on rstan as I just started to work with it.

Therefore, I would really appreciate if someone could help me!

Thank you very much in advance,
Alexandra

andrjohns · January 23, 2021, 1:27pm

Hi Alexandra,

If you change x to a matrix[N,K], then you can use the normal_id_glm distribution (linear regression):

data {
  int<lower=0> N;  // number of observations
  int<lower=0> J;  // number of brands
  int<lower=0> K;  // number of iv's (columns in the predictors matrix)
  vector[N] y;  //  outcome variable
  matrix[N,K] x;  // matrix (array) of predictors
  int<lower=1,upper=J> brand[N]; // brands for N observations
}

parameters {
  real alpha[J];
  vector[K] beta;
  real<lower=0> sigma;
}

model {
  y ~ normal_id_glm(x, alpha[brand], beta, sigma);
}

The normal_id_glm approach has the most efficient construction of the likelihood (and gradients), and also allows for GPU acceleration with large datasets.

Hope that helps!

Alexandra · January 23, 2021, 3:18pm

Hi Andrew,

thank you very much for your fast response! I just tried the code you sent and got the following error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  vector ~ normal_id_glm(matrix, real[ ], vector, real)

Available argument signatures for normal_id_glm:

  vector ~ normal_id_glm(matrix, real, vector, real)
  vector ~ normal_id_glm(matrix, vector, vector, real)

Real return type required for probability function.
 error in 'model119034e48bc_test_matrix_re_vectorization' at line 18, column 52
  -------------------------------------------------
    16: 
    17: model {
    18:     y ~ normal_id_glm(x, alpha[brand], beta, sigma);
                                                           ^
    19: }
  -------------------------------------------------

As far as I understand the problem is alpha[brand] but I don’t know how to solve it. I also tried to specify alpha outside the function as in:

transformed parameters {
  real alpha_b[J];
  alpha_b = alpha[brand]; 
}

model {

    y ~ normal_id_glm(x, alpha_b, beta, sigma);
}

But that doesnt help and yields the same error.

Also: in my data later I will have more than one grouping factor so I have varying intercepts over different factors. My model then looks like this (right now):

 for(i in 1:N)  y[i] ~ normal(alpha_b[brand[i]] + alpha_s[store[i]] + alpha_w[week[i]] + x[i] * beta, sigma);

Is there any way I can implement multiple grouping factors in the normal_id_glm function as well? I couldn’t find anything on that in the function description.

Thanks for your help,
Alexandra

nhuurre · January 23, 2021, 3:26pm

Change alpha to a vector

parameters {
  vector[J] alpha;
  vector[K] beta;
  real<lower=0> sigma;
}

andrjohns · January 23, 2021, 5:20pm

Yep, you just include the same construction in the place of alpha[brand]:

y ~ normal_id_glm(x, alpha_b[brand] + alpha_s[store] + alpha_w[week], beta, sigma);

Alexandra · January 23, 2021, 10:30pm

Hi Niko,

that solved my problem, now it works perfectly fine! Thank you very much :)

Alexandra · January 23, 2021, 10:31pm

Okay, that seemed to work. Thank you so much for your help! This really helped a lot!

Alexandra · January 28, 2021, 5:00pm

Hello again,

I have one more question regarding the normal_id_glm function:

I am now trying to extend my model by one more predictor matrix. In the simple form it looks like this:

for(i in 1:N)  y[i] ~ normal(alpha_b[brand[i]] + alpha_s[store[i]] + alpha_w[week[i]] + x_1[i] * beta + x_2[i] * gamma, sigma)

I tried to implement this into the normal_id_glm()-function the same way as for alpha:

y ~ normal_id_glm(x_1 + x_2, alpha_b[brand] + alpha_s[store] + alpha_w[week], beta + gamma, sigma);

Although this model doesn’t throw any error, the resuling numbers are very wrong. I expect this happens, because the matrices are actually added up somehow?!

So the easy solution would probably be to just combine the two matrices to one, but if there is a possibility to keep them apart it would be much better for me (for analysis of the results later).

Thanks again very much for your help in advance,
Alexandra

nhuurre · January 28, 2021, 6:16pm

This

normal_id_glm(x_1 + x_2, ..., beta + gamma, sigma);

gives a term like

\left(x_1 + x_2\right)\left(\beta + \gamma\right) = x_1\beta + x_2\beta + x_1\gamma + x_2\gamma

but you need \left(x_1\beta + x_2\gamma\right) which can be created by concatenating (not adding) the vectors

normal_id_glm(append_col(x_1, x_2), ..., append_row(beta, gamma), sigma);

Alexandra · January 28, 2021, 7:18pm

Perfect… thank you for the explanation, that makes sense!

Now the model works perfectly fine!

Jacob_Warren · January 25, 2022, 7:04pm

Is it possible to do this with brand represented as an indicator variable matrix?

Some like the following:

data{
  matrix[N, J] z;             // indicator variable matrix for brand
}
parameters{
  vector[J] alpha;
}
model{
  y ~ normal_id_glm(x, z * alpha, beta, sigma);
}

andrjohns · January 26, 2022, 5:13am

I’m not entirely sure what you mean by an indicator variable matrix, but as long as z * alpha returns a vector with the desired intercept for each individual then your code is fine

Jacob_Warren · January 26, 2022, 4:12pm

Ok, looks like it will work. What I meant was, for example, instead of brand represented as this:

brand
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

It’s represented as this:

 brand1 brand2 brand3 brand4 brand5
      1      0      0      0      0
      1      0      0      0      0
      1      0      0      0      0
      0      1      0      0      0
      0      1      0      0      0
      0      1      0      0      0
      0      0      1      0      0
      0      0      1      0      0
      0      0      1      0      0
      0      0      0      1      0
      0      0      0      1      0
      0      0      0      1      0
      0      0      0      0      1
      0      0      0      0      1
      0      0      0      0      1

mike-lawrence · February 11, 2022, 2:38pm

FYI, Other terms for that kind of contrast coding are “one-hot” and 'indicator contrasts". If you use such contrasts, you have to remove the intercept from the model, so if using normal_id_glm, the alpha argument needs to be zero.

Topic		Replies	Views
Multilevel Models with Varying Intercepts and Slopes with Covariance Modeling	14	517	June 17, 2024
Multilevel model with two predictors and their interaction Modeling rstan , specification	3	106	October 2, 2024
The simplest varying intercepts/slopes model there is Modeling rstan , example-models	12	1185	April 14, 2020
Trying to make a "three-level nested linear model" run faster Modeling performance	12	2654	January 14, 2020
Help with Varying Intercept, Varying Slope Model Modeling	2	894	April 12, 2020

Improving efficiency in regression model with varying intercepts (vectorization)

Related topics