# Multi-indexing to summarize data across groups

I am struggling to calculate the mean of a parameter, epsilon, estimated for a group P, for each level of another group W in which P is nested. I suspect I need some kind of multi-indexing strategy but I haven’t figured it out so far. Here is an obviously failed attempt, but I hope it provides an idea of what I am trying to do.

data {
int<lower=0> N;                    // number of observations
int<lower=0> Nw;                   // number of levels in W
int<lower=0> Np;                   // number of levels in P
array[N] int<lower=1, upper=Nw> W; // vector of W idx
array[N] int<lower=1, upper=Np> P; // vector of P idx (P nested in W)
vector[N] y;                       // vector of observations
}

parameters {
vector[Np] epsilon; // epsilon is a parameter for P
// ...
}

model {
epsilon ~ normal(0, sigma_epsilon);
// ...
}

generated quantities {
// Here I am trying to calculate epsilon_bar, the average of
// epsilon for each W (while epsilon was defined for Np)...
vector[Nw] epsilon_bar;
for(s in 1:Nw){
epsilon_bar[s] = mean(epsilon[W[s]]);
}
// ...
}


I am actually trying to convert the following JAGS code, and I am not sure how to convert JAGS’ epsilon[P, W] syntax to Stan:

model{
for(i in 1:N){
y[i] ~ dnorm(mu[i], tau)
mu[i] <- b0 + epsilon[P[i], W[i]] + gamma[W[i]]
}
for(i in 1:Nw){
# Mean P-level random effects within each W:
epsilon_bar[i] <- mean(epsilon[,i])
}
# ...
}


The easiest way that comes to mind is to calculate (or pass) the number of P groups per W group. Then, rather than index all of the Ps simultaneously, you can just add to a running mean, dividing epsilon by the appropriate denominator.

data {
int<lower=0> N;                    // number of observations
int<lower=0> Nw;                   // number of levels in W
int<lower=0> Np;                   // number of levels in P

array[N] int<lower=1, upper=Nw> W; // vector of W idx
array[N] int<lower=1, upper=Np> P; // vector of P idx (P nested in W)
vector[N] y;                       // vector of observations

array[Np] int<lower = 1, upper = Nw> Wp; // Which group W does group P fall in?
}
transformed data{
vector[Nw] num_p = rep_vector(0, Nw); // Number of groups P in group W

for(p in 1:Np){
num_p[Wp[p]] += 1.0;
}
}

parameters {
vector[Np] epsilon; // epsilon is a parameter for P
// ...
}

model {
epsilon ~ normal(0, sigma_epsilon);
// ...
}

generated quantities {
// Here I am trying to calculate epsilon_bar, the average of
// epsilon for each W (while epsilon was defined for Np)...
vector[Nw] epsilon_bar = rep_vector(0, Nw);
for(p in 1:Np){
epsilon_bar[ Wp[p] ] += epsilon[p] / num_p[p];
}
// ...
}



You could do the same thing with matrix multiplication, setting \bar{\epsilon} = M \epsilon where M is a N_w \times N_p matrix where the elements are the reciprocals of the number of elements in each group W.

In other words, there are the same number of groups P per group W? In that case, you can specify \epsilon as a matrix. Then average across the rows.

parameters{
matrix[Np, Nw] epsilon;
}
transformed parameters{
vector[N] mu;
for(n in 1:N){
mu[n] = epsilon[P[n], W[n]];
}
}
generated quantities{
vector[Nw] epsilon_bar;
for(w in 1:Nw){
epsilon_bar[w] = mean(epsilon[,w]);
}
}
1 Like

Thank you @simonbrauer! Working on your first suggestion, your code was giving me an “out of index error” (index growing out of bond) for the loop on \overline{\epsilon} in the generated quantities block. I think  epsilon_bar[ Wp[p] ] += epsilon[p] / num_p[p]; should be  epsilon_bar[ Wp[p] ] += epsilon[p] / num_p[ Wp[p] ]; as num_p has Nw dimensions (not Np), am I correct?

1 Like