Multi-indexing to summarize data across groups

I am struggling to calculate the mean of a parameter, epsilon, estimated for a group P, for each level of another group W in which P is nested. I suspect I need some kind of multi-indexing strategy but I haven’t figured it out so far. Here is an obviously failed attempt, but I hope it provides an idea of what I am trying to do.

data {
  int<lower=0> N;                    // number of observations
  int<lower=0> Nw;                   // number of levels in W
  int<lower=0> Np;                   // number of levels in P
  array[N] int<lower=1, upper=Nw> W; // vector of W idx
  array[N] int<lower=1, upper=Np> P; // vector of P idx (P nested in W)
  vector[N] y;                       // vector of observations
}

parameters {
  vector[Np] epsilon; // epsilon is a parameter for P
  // ...
}

model {
  epsilon ~ normal(0, sigma_epsilon);
  // ...
}

generated quantities {
  // Here I am trying to calculate epsilon_bar, the average of 
  // epsilon for each W (while epsilon was defined for Np)...
  vector[Nw] epsilon_bar;
  for(s in 1:Nw){
    epsilon_bar[s] = mean(epsilon[W[s]]);
  }
  // ...
}

I am actually trying to convert the following JAGS code, and I am not sure how to convert JAGS’ epsilon[P, W] syntax to Stan:

model{
    for(i in 1:N){
      y[i] ~ dnorm(mu[i], tau)
      mu[i] <- b0 + epsilon[P[i], W[i]] + gamma[W[i]]
    }
    for(i in 1:Nw){
      # Mean P-level random effects within each W:
      epsilon_bar[i] <- mean(epsilon[,i])
    }
    # ...
  }

The easiest way that comes to mind is to calculate (or pass) the number of P groups per W group. Then, rather than index all of the Ps simultaneously, you can just add to a running mean, dividing epsilon by the appropriate denominator.

data {
  int<lower=0> N;                    // number of observations
  int<lower=0> Nw;                   // number of levels in W
  int<lower=0> Np;                   // number of levels in P

  array[N] int<lower=1, upper=Nw> W; // vector of W idx
  array[N] int<lower=1, upper=Np> P; // vector of P idx (P nested in W)
  vector[N] y;                       // vector of observations

  array[Np] int<lower = 1, upper = Nw> Wp; // Which group W does group P fall in?
}
transformed data{
   vector[Nw] num_p = rep_vector(0, Nw); // Number of groups P in group W

  for(p in 1:Np){
     num_p[Wp[p]] += 1.0;
  }
}

parameters {
  vector[Np] epsilon; // epsilon is a parameter for P
  // ...
}

model {
  epsilon ~ normal(0, sigma_epsilon);
  // ...
}

generated quantities {
  // Here I am trying to calculate epsilon_bar, the average of 
  // epsilon for each W (while epsilon was defined for Np)...
  vector[Nw] epsilon_bar = rep_vector(0, Nw);
  for(p in 1:Np){
    epsilon_bar[ Wp[p] ] += epsilon[p] / num_p[p];
  }
  // ...
}

You could do the same thing with matrix multiplication, setting \bar{\epsilon} = M \epsilon where M is a N_w \times N_p matrix where the elements are the reciprocals of the number of elements in each group W.

In other words, there are the same number of groups P per group W? In that case, you can specify \epsilon as a matrix. Then average across the rows.

parameters{
   matrix[Np, Nw] epsilon;
}
transformed parameters{
   vector[N] mu;
   for(n in 1:N){
      mu[n] = epsilon[P[n], W[n]];
   }
}
generated quantities{
   vector[Nw] epsilon_bar;
   for(w in 1:Nw){
      epsilon_bar[w] = mean(epsilon[,w]);
   }
}
1 Like

Thank you @simonbrauer! Working on your first suggestion, your code was giving me an “out of index error” (index growing out of bond) for the loop on \overline{\epsilon} in the generated quantities block. I think epsilon_bar[ Wp[p] ] += epsilon[p] / num_p[p]; should be epsilon_bar[ Wp[p] ] += epsilon[p] / num_p[ Wp[p] ]; as num_p has Nw dimensions (not Np), am I correct?

1 Like