How to incorporate differences over countries in multilevel model for panel data

Currently, I am analyzing a panel data set where the dependent variable is a count variable. The dependent variable denotes the number of cars of a specific model in a month, with roughly 60 months.
Therefore, I constructed a negative binomial hierarchical model. My stan code looks like the following now for just one explanatory variable

data {
  int<lower=1> N;          
  int<lower=0> p;
  int<lower=0> activations[N];              
  //matrix[N, p] X;    

  int<lower=1> K;
  int<lower=1> J;
  int<lower=1> ID[N];
  matrix[J,K] multilevel_data;
  
  int<lower=1> p_group; //number of individual specific variables
  matrix[N, p_group] X_group;
}
parameters {
  real<lower=0> inv_phi;   // 1/phi (easier to think about prior for 1/phi instead of phi)
  vector<lower = 0>[p_group] sigma_groups;
  vector[p_group] mu_groups;
  vector[J] offset[p_group]; 
  
  vector[K] kappa[p_group];    
}
transformed parameters {
  real phi = inv(inv_phi);
  // non-centered parameterization
  vector[J] alpha = mu_groups[1] + multilevel_data * kappa[1] + sigma_groups[1] * offset[1];
  vector[J] zeta = mu_groups[2] + multilevel_data * kappa[2] + sigma_groups[2] * offset[2];
}
model {
  offset[1] ~ normal(0, 1);
  sigma_groups[1] ~ normal(0, 1);
  mu_groups[1] ~ normal(log(4), 1);
  
  offset[2] ~ normal(0, 1);
  sigma_groups[2] ~ normal(0, 1);
  mu_groups[2] ~ normal(-0.25, 1);
  
  kappa[1] ~ normal(0, 1);  
  kappa[2] ~ normal(0, 1);  
  inv_phi ~ normal(0, 1);
  activations ~ neg_binomial_2_log(alpha[ID] + zeta[ID] .* col(X_group, 2), phi);
} 
generated quantities {
  int y_rep[N];
  vector[N] log_lik;
  
  for (n in 1:N) {
    real eta_n = alpha[ID[n]] + zeta[ID[n]] * X_group[n,2];
    y_rep[n] = neg_binomial_2_log_safe_rng(eta_n, phi);
    log_lik[n] = neg_binomial_2_log_lpmf(activations[n]| eta_n, phi);
  }
}

This models the total number of activations over different countries. However, I also have the data available for each country. So, my explanatory variable looks like this:

Model Count Month Country
audi TT 34 02-2012 UK
audi TT 23 02-2012 DE
audi TT 20 02-2012 FR
etc

My ultimate goal is to make country-specific forecasts for the number of cars of a specific model in a specific month. Does anyone know how I could incorporate this in my multilevel model? I am not sure how i can do this with adding additional levels.

Hi

I would like to understand your hierarchy: Is it Country / Then Month or is it simply Country?
For the Country / Month structure to work you should have another level available. If not your hierarchy would be just one level. Do you have manufacturer (say GM) and brands within manufacturer (Chevrolet, Buick, Cadillac etc)?

You cal take a look at this nice code example on using Stan for Multilevel model by Julian Faraway

You can also examine additional example by Prof. Anderson’s blog on Multilevel Models with brms package

Sree

Thanks for your reply!

My hierarchy is indeed Country then Month. Could you maybe also explain how I can add such a level in my existing code? Because my explanatory variable will then change from y_{it} (number of cars of model i at time t) to y_{ijt} (number of cars of model i in country j at time t) and I don’t see how that will work if I add a level with country-specific variables.
Next to that, I also have the manufacturer (GM) and brands within manufacturer (Chevrolet etc.). So an additional step will be to incorporate this into the model, but that will be more easy when I know how to do it with countries first.

Thanks for the help!

Hi @c5v

Please follow the link to examine the setup of a 3 level multilevel model with RStan. The unit of observation is student, next level is class and the highest level is school id. Since I’m at work I can take a look your specific question much later. Hopefully this link will help you in the mean time.
3 Level Model with RStan

Another suggestion I would provide is to try to use the excellent tutorial by Paul Bliese on using R package multilevel with nlme to understand your data better. The link is PDF file with excellent coverage on multi-level modeling, the traditional way.

With the traditional parameters estimated, you would be in a better position to estimate your models using Stan. Hope these links are useful for you.

Sree

@c5v

I forgot to include this link within the Stan forums under a 3 level multilevel model Example of a 3 level multilevel model specification in Stan

Sree

1 Like

Thanks for the help! Really appreciated!

@sree_d

If I understand the example correctly, in my code I should add another level on top of the one I already have (in the alpha and zeta parameters where I currently have one level with variables multilevel)? And if I want this level to denote the country level I should add country-specific variables?

Sorry for all my questions but I am completely new to Stan.

@c5v

Yes that is correct. I would ask you to proceed slowly if you are using Stan for the first time. Have you already examined the data in R with multilevel package and nlme? If not I would definitely recommend it.
Take a small sample of your data where you can define 2 levels. This would Country at the top and Manufacturers below (counts aggregated at the Manufacturer level by aggregating across months. Build the model with Stan (this forum has examples for two level models). Then in a similar dataset, add brand as a third level and work it out in Stan. In effect I’m saying take the time dimension out.

After you have figured out the structure and code without the time dimension, then bring it back. Another alternative is to work with all levels at the same time with a subset of the months, say 6 months. Select 2 Countries (US and Canada would work well), 2 common manufacturers, 2 common Brands and 6 months of data for each brand for a total of 48 records. Work out the structure and code.

Then setting up the full data will be child’s play. If you are not in a hurry, I’m trying to set up an example panel dataset that has 4 levels like you do and set up the RStan code. It will take me at least a week or so to set it up. If that will help you let me know and I will help you.

Sree

@sree_d

Thanks for the help, I will do it the way you said.

The coming month I will be doing the multilevel analysis, so it will really definitely help me if you could post an example of it. Keep me updated!

@c5v

You are welcome. All the best. with your efforts As soon as I have a working script, I will post it here and ping you.

Sree

1 Like