Calculate variance of parameter by Group

Hi all!

I am trying to fit a model where I calculate a parameter for each J unit and then use the variance of that parameter in each C group as a prior for another parameter. I am attaching a minimum version of the model that gets at the problem I am facing in calculating the variance in the parameter for each group. But I give the 1st parameter as data in the example as to not complicate the model. I have also included some R code to generate the data to run this example.

The problem I am having is in getting the units into their indicated groups to calculate the variance.

PS: This is my first post so I apologize if the formatting is incorrect in any way.

data{
  int<lower=0> J;                       // number of units
  int<lower=1> D;                       // number of dimensions
  int<lower=0> C;                       // Number of groups
  int<lower=1, upper=C> Q[J];    // unit group indicator

  matrix[J,D] position;                 //position for unit j in dimension d
}

parameters {
  vector[C] zeta;
}

transformed parameters {
  matrix[C,D] variance_position;

  for (c in 1:C){
    for (d in 1:D){
        variance_position[c,d] = variance(position[Q[c],d]); // calculate variace of position for each group
    }
  }


}

model{
  for (c in 1:C){
    target+= normal_lpdf(zeta|variance_position[c,1]+variance_position[c,2],1);
  }
}

R code to create the data
D <-2
J <- 30
C <- 5

c<- rep(1:C,3)

position <- matrix(data=rnorm(60,0,1),nrow=J,ncol=D)

data_d <- data.frame(Country = c, Group = c(1:J), position)

var1 <- data_d %>% group_by(Country) %>% summarise(var(X1))
var2 <- data_d %>% group_by(Country) %>% summarise(var(X2))

data_d %>% summarise(var(X2))

stan <- list(
J = J,
D = D,
Q = as.integer(data_d$Country),
C = C,
position = position)

1 Like

Hi, Leo,

welcome to the Stan discourse. Could you elaborate on what the structure of the model is, are there more than two levels in the hierarchy and are those reflected in the priors?

Also, are you getting any kind of error? It’s not clear to me how zeta has dimension C but you are looping over C and using the whole vector with a scalar that is the sum of the first 2 dimensions of variance_position(what would happen if D=1, and if D>2).

Could you give some clarification on these two points?

Hi Caetano,

To your first question there is only 2 levels in the hierarchy in the model. In the full model it is organizations within countries.

To your second point, in the full model, zeta should actually be a matrix of CxD, and the prior mean is the variance_position[c,d] . So if D = 1 or > 2, the model should just use that point as the prior mean. I did not include that in the example because I thought that it was not very relevant to the problem I am facing.

I do not get any errors running the code, however the variance_position is not calculating the variance correctly, I am assuming that there was something off with the way I am indexing in the transformed parameters block.

I hope that clarified things, let me know if that’s not the case.

Thank you

I’m still confused about whether variance_position are variance values, because they are arguments for the mean in the normal_lpdf function.

If zeta is a vector[C] (or rather I think it should be real zeta[C] array) you can have the mean and variance arguments both be arrays, be an array and scalar, or scalar and array, in this case you seem to have always the same vector zeta, changing means (variance_position[c,1]+variance_position[c,2]), and constant variance 1. Doesn’t sound like what you would want in any case.

I assume your data would have multiple observations for each zeta[c,d] entry (if it’s a two-dimensional array), but I’m not sure I can visualize the whole picture still.

The variance_position should be the variance of the position values for each c group in each d dimension. Think of zeta as a weight on each j unit d dimension that is the same for all units in the same group group, where the prior mean of that weight is given by the variance of that position in each group in each dimension.

Check out the documentation on the normal distribution, if zeta is a parameter it seems like it should be the second argument, and the variance the 3rd, after all the parameters are transformed. The first argument should be your data.

I realize this is an example minimal model, but it seems to me that this is off.