Adding levels to a hierarchical Dirichlet-Multinomial model

Hello I’m building a hierarchical Dirichlet-Multinomial model to predict budget allocation splits across K alternatives.

The allocation split is a function of the total budget level, and this varies by several categories in a hierarchical structure (for example country, sector).

Currently the model runs ok with the betas and concentration parameter (kappa) varying by country (M parameter below), but I want to add more layers to the hierarchy (e.g. by country, by sector).

My questions are:

  • How can I optimise the current model model? Is this the best way to structure it for efficient sampling?

  • How can I extend the model hierarchy to include variation at additional multiple levels (e.g. by country and by sector)? It feels like a similar problem to the ‘seemingly unrelated regression’ in section 5.15 (Multivariate Outcomes) in the Stan manual. I’ve made a start but not sure how to handle the parameter correlations

All help and ideas much appreciated!

// ======================================================================
// ======================================================================

functions {

// for likelihood estimation
  real dirichlet_multinomial_lpmf(int[] y, vector alpha) {
    real alpha_plus = sum(alpha);
    return lgamma(alpha_plus) + sum(lgamma(alpha + to_vector(y)))
                - lgamma(alpha_plus+sum(y)) - sum(lgamma(alpha));


data {
  int<lower = 1> K;                 // num alternatives
  int<lower = 1> N;                 // num observations
  int<lower = 1> M;                 // num countries
  int<lower = 0> y[N, K];           // allocation on each choice
  int<lower = 1> market[N];         // markets
  int<lower = 0> budget[N];         // total spend
  vector[N] log_budget;             // log spend
  int<lower = 1> N_new;             // new num observations
  int<lower = 1> budget_new[N_new]; // new budget for simulation 
  vector[N_new] log_budget_new;     // log new budget

parameters {
  matrix<lower = 0>[M,K] beta_0;
  matrix<lower = 0>[M,K] beta_1;
  vector<lower = 1>[M] kappa;

  vector<lower = 0>[K] mu_beta_0;
  vector<lower = 0>[K] mu_beta_1;
  vector<lower = 0>[M] mu_kappa;

  vector<lower = 0>[K] sigma_beta_0;
  vector<lower = 0>[K] sigma_beta_1;
  vector<lower = 0>[M] sigma_kappa;


transformed parameters {
  simplex[K] theta[N]; 
  vector[K] x_beta[N];

  // create XB index
  for (n in 1:N) {
      for(k in 1:K){
        x_beta[n,k] = beta_0[market[n],k] + beta_1[market[n],k]*log_budget[n];
      theta[n] = softmax(x_beta[n]);

model {

// priors
mu_beta_0 ~ lognormal(0,0.1);  
mu_beta_1 ~ lognormal(0,0.1);
mu_kappa ~ lognormal(0,1);
sigma_beta_0 ~ lognormal(0,0.1);
sigma_beta_1 ~ lognormal(0,0.1);
sigma_kappa ~ lognormal(0,0.1);

for (m in 1:M){
  to_vector(beta_0[m]) ~ gamma(mu_beta_0,sigma_beta_0);
  to_vector(beta_1[m]) ~ gamma(mu_beta_1,sigma_beta_1);
  to_vector(kappa) ~ lognormal(mu_kappa,sigma_kappa);

for (n in 1:N) {
      y[n] ~ dirichlet_multinomial(kappa[market[n]]*theta[n]);

generated quantities {
int<lower=0> channel_sim[M,N_new,K]; 
vector[K] theta_new[M,N_new];

  vector[K] alpha_new[N_new];
  int channel_sim_tmp[N_new,K];
  for (m in 1:M){
    for (n in 1:N_new){
      for(k in 1:K){
        alpha_new[n,k]   = beta_0[m,k] + beta_1[m,k]*log_budget_new[n];
      theta_new[m,n]     = kappa[m]*softmax(to_vector(alpha_new[n]));
      channel_sim_tmp[n] = dirichlet_multinomial_rng(theta_new[m,n],budget_new[n]);
channel_sim[m] = channel_sim_tmp;

1 Like

some (incomplete) suggestions, hope they will lead you in the right direction!

At first glance, there does not seem to be much to optimise. How fast is the model and big is the dataset? Some minor suggestions:

for (m in 1:M){
  to_vector(kappa) ~ lognormal(mu_kappa,sigma_kappa);

This is probably a mistake as you are adding the same kappa term M times - you probably should just put that statement out of the loop.

  to_vector(beta_0) ~ gamma(mu_beta_0,sigma_beta_0);
  to_vector(beta_1) ~ gamma(mu_beta_1,sigma_beta_1);
  to_vector(kappa) ~ lognormal(mu_kappa,sigma_kappa);

Also in transformed parameters

 for(k in 1:K){
        x_beta[n,k] = beta_0[market[n],k] + beta_1[market[n],k]*log_budget[n];

could be replaced by

x_beta[n] = to_vector(beta_0[market[n],] + beta_1[market[n],]*log_budget[n]);

for a minor efficiency gain thanks to vectorization.

I am not sure what are you aiming for in here, but it does not look like multivariate outcomes to me. I can see two alternatives

  1. you have per-sector-and-country data on budget totals - then it would make sense to simply predict splits per-sector-and-country with country as one of the predictors (possibly with a hierarchical prior) and the split by country is implied by summing all of the sectors
  2. you want to model per-sector-and-country data as parameters depending on the per-country split. In this way you would just use the predicted splits per-country (including the observation model you have) and add a new split per-sector which just multiplies the per-country split by per-sector split. The per-country-and-sector prediction that would have its own observation model.

Hope that helps.

Thank you Martin - really appreciate your comments. Will digest them see where I get to!

Thanks again for your response to this @martinmodrak .

On the last point - I think I may have worded my question poorly (or am not fully understanding your answer). All I really want to know is how should I structure my code to add another level onto the hierarchy.

To put it in the context of a more simple example, if we were modelling the impact of a continuous variable X on another continuous variable Y (e.g. sales), with panel data across subjects eg. states, we could could impose a hierarchical structure by putting a distribution on the betas for X.

What I want to know is, if we also had data on another categorical variable eg. product type, and also want our betas to vary at this level, what is the right way to incorporate that extra non-nested level? Do we put a further distribution on the means of the state distributions? Are there better ways?

Any further help much appreciated!