Improving efficiency in regression model with varying intercepts (vectorization)

Hello everyone,

I am trying to optimize my model code for a (multilevel) regression with vaying intercepts.
I try my code on some simulated test data, containing a matrix of predictors (test_matrix) as well as a group variable with four levels (test_brands).

test_matrix <- cbind(rnorm(1000, 4, 7),
                     rnorm(1000, 0, 1))

test_vector <- c(2,4)

test_brands <- data.frame(brand = rep(1:4, 250))
test_dummies <- test_brands %>%
  mutate(brand_1 = as.numeric(ifelse(test_brands == 1, 1, 0)),
         brand_2 = as.numeric(ifelse(test_brands == 2, 1, 0)),
         brand_3 = as.numeric(ifelse(test_brands == 3, 1, 0)),
         brand_4 = as.numeric(ifelse(test_brands == 4, 1, 0))) %>%
  dplyr::select(-brand) %>%
  as.matrix()

test_vector_dummies <- c(1,2,3,4)


test_outcome <- 100 +  test_matrix %*% test_vector + test_dummies %*% test_vector_dummies + rnorm(1000, 0, 1)

test_data <- data.frame(test_outcome,
                        test_matrix,
                        test_brands)

I am able to estimate the coefficients (test_vector) for test_matrix with the following Bayesian model, which I based on descriptions in the rstan manual:

data {
  int<lower=0> N;  // number of observations
  int<lower=0> J;  // number of brands
  int<lower=0> K;  // number of iv's (columns in the predictors matrix)
  vector[N] y;  //  outcome variable
  row_vector[K] x[N];  // matrix (array) of predictors
  int<lower=1,upper=J> brand[N]; // brands for N observations
}

parameters {
  real alpha[J];
  vector[K] beta;
  real<lower=0> sigma;
}

model {
    for(i in 1:N) {  
  y[i] ~ normal(alpha[brand[i]] + x[i] * beta, sigma);
  }
}

With this model I do get the desired results, however, I am wondering, if there is some possibility to make my code more efficient (e.g. vectorization of the model, …) as I ultimately want to apply this model to rather big datasets. I didn’t find any possibility for vectorization here and I’m not even ure if there is any (because I have only varying intercepts and not varying slopes) … but at least none of my versions worked. However, I am certainly not an expert on rstan as I just started to work with it.

Therefore, I would really appreciate if someone could help me!

Thank you very much in advance,
Alexandra

1 Like

Hi Alexandra,

If you change x to a matrix[N,K], then you can use the normal_id_glm distribution (linear regression):

data {
  int<lower=0> N;  // number of observations
  int<lower=0> J;  // number of brands
  int<lower=0> K;  // number of iv's (columns in the predictors matrix)
  vector[N] y;  //  outcome variable
  matrix[N,K] x;  // matrix (array) of predictors
  int<lower=1,upper=J> brand[N]; // brands for N observations
}

parameters {
  real alpha[J];
  vector[K] beta;
  real<lower=0> sigma;
}

model {
  y ~ normal_id_glm(x, alpha[brand], beta, sigma);
}

The normal_id_glm approach has the most efficient construction of the likelihood (and gradients), and also allows for GPU acceleration with large datasets.

Hope that helps!

4 Likes

Hi Andrew,

thank you very much for your fast response! I just tried the code you sent and got the following error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  vector ~ normal_id_glm(matrix, real[ ], vector, real)

Available argument signatures for normal_id_glm:

  vector ~ normal_id_glm(matrix, real, vector, real)
  vector ~ normal_id_glm(matrix, vector, vector, real)

Real return type required for probability function.
 error in 'model119034e48bc_test_matrix_re_vectorization' at line 18, column 52
  -------------------------------------------------
    16: 
    17: model {
    18:     y ~ normal_id_glm(x, alpha[brand], beta, sigma);
                                                           ^
    19: }
  -------------------------------------------------

As far as I understand the problem is alpha[brand] but I don’t know how to solve it. I also tried to specify alpha outside the function as in:

transformed parameters {
  real alpha_b[J];
  alpha_b = alpha[brand]; 
}

model {

    y ~ normal_id_glm(x, alpha_b, beta, sigma);
}

But that doesnt help and yields the same error.

Also: in my data later I will have more than one grouping factor so I have varying intercepts over different factors. My model then looks like this (right now):

 for(i in 1:N)  y[i] ~ normal(alpha_b[brand[i]] + alpha_s[store[i]] + alpha_w[week[i]] + x[i] * beta, sigma);

Is there any way I can implement multiple grouping factors in the normal_id_glm function as well? I couldn’t find anything on that in the function description.

Thanks for your help,
Alexandra

Change alpha to a vector

parameters {
  vector[J] alpha;
  vector[K] beta;
  real<lower=0> sigma;
}
3 Likes

Yep, you just include the same construction in the place of alpha[brand]:

y ~ normal_id_glm(x, alpha_b[brand] + alpha_s[store] + alpha_w[week], beta, sigma);
1 Like

Hi Niko,

that solved my problem, now it works perfectly fine! Thank you very much :)

Okay, that seemed to work. Thank you so much for your help! This really helped a lot!

1 Like

Hello again,

I have one more question regarding the normal_id_glm function:

I am now trying to extend my model by one more predictor matrix. In the simple form it looks like this:

for(i in 1:N)  y[i] ~ normal(alpha_b[brand[i]] + alpha_s[store[i]] + alpha_w[week[i]] + x_1[i] * beta + x_2[i] * gamma, sigma)

I tried to implement this into the normal_id_glm()-function the same way as for alpha:

y ~ normal_id_glm(x_1 + x_2, alpha_b[brand] + alpha_s[store] + alpha_w[week], beta + gamma, sigma);

Although this model doesn’t throw any error, the resuling numbers are very wrong. I expect this happens, because the matrices are actually added up somehow?!

So the easy solution would probably be to just combine the two matrices to one, but if there is a possibility to keep them apart it would be much better for me (for analysis of the results later).

Thanks again very much for your help in advance,
Alexandra

This

normal_id_glm(x_1 + x_2, ..., beta + gamma, sigma);

gives a term like

\left(x_1 + x_2\right)\left(\beta + \gamma\right) = x_1\beta + x_2\beta + x_1\gamma + x_2\gamma

but you need \left(x_1\beta + x_2\gamma\right) which can be created by concatenating (not adding) the vectors

normal_id_glm(append_col(x_1, x_2), ..., append_row(beta, gamma), sigma);
1 Like

Perfect… thank you for the explanation, that makes sense!

Now the model works perfectly fine!

Is it possible to do this with brand represented as an indicator variable matrix?

Some like the following:

data{
  matrix[N, J] z;             // indicator variable matrix for brand
}
parameters{
  vector[J] alpha;
}
model{
  y ~ normal_id_glm(x, z * alpha, beta, sigma);
}

I’m not entirely sure what you mean by an indicator variable matrix, but as long as z * alpha returns a vector with the desired intercept for each individual then your code is fine

Ok, looks like it will work. What I meant was, for example, instead of brand represented as this:

brand
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

It’s represented as this:

 brand1 brand2 brand3 brand4 brand5
      1      0      0      0      0
      1      0      0      0      0
      1      0      0      0      0
      0      1      0      0      0
      0      1      0      0      0
      0      1      0      0      0
      0      0      1      0      0
      0      0      1      0      0
      0      0      1      0      0
      0      0      0      1      0
      0      0      0      1      0
      0      0      0      1      0
      0      0      0      0      1
      0      0      0      0      1
      0      0      0      0      1

FYI, Other terms for that kind of contrast coding are “one-hot” and 'indicator contrasts". If you use such contrasts, you have to remove the intercept from the model, so if using normal_id_glm, the alpha argument needs to be zero.