Hi all!

I have a question on how to calculate the finite population standard deviation. In the chapter 21 and 22 of the book“Data analysis using regression and multilevel/hierarchical models”.the finite population SD was calculated simply by

sd(α_j), where α_j is the estimated value for each level of a factor. Therefore, if a factor just have two level, the finite population SD is sd(α_1,α_2). But if I want to evaluated the contribution of this factor to the variation of the response variable Y ( as shown in Figure 22.4 and 22.5), should we put α_j back to the original data set to calculate the finite population SD?

For example if the data set have 10 rows, and a factor X only have two levels, the first level have 6 rows, and the second level have 4 row, then should we calculate the finite population SD as sd(α_1,α_1,α_1,α_1,α_1,α_1,α_2,α_2, α_2,α_2), or just by sd(α_1,α_2)? I think sd(α_1,α_2) might be inappropriate, because the sd_error was calculated by 10 value :

ey[i]=(y[i]-y_ha[i]), and sd_error=sd(ey[i]),

Therefore, put α_j back to the original dataset might by more appropriate for calculating the finite population SD and the contribution of this factor to variation of the response variable. Do you think so?

I raise this question because I see on a web site Tutorial 8.2b - Comparing two populations (Bayesian), the finite population SD was calculated as sd(α_1,α_1,α_1,α_1,α_1,α_1,α_2, α_2, α_2, α_2), it is apparent that the two methods will get different results.

In the attachment, I uploaded an example of the two methods with stan code and data, using the example of Figure 22.4 in your book.

I feel quite confuse about this question and any one can help me?

see the difference of the two methods in the following example model

```
data {
int<lower=0> J;
int<lower=0> N;
int<lower=1,upper=J> county[N];
vector[N] y;
}
parameters {
vector[J] a;
real mu_a;
real<lower=0,upper=100> sigma_a;
real<lower=0,upper=100> sigma_y;
}
transformed parameters {
vector[N] y_hat;
for (i in 1:N){
y_hat[i] = a[county[i]];
}
}
model {
mu_a ~ normal(0, 1);
a ~ normal(10 * mu_a, sigma_a);
y ~ normal(y_hat, sigma_y);
}
generated quantities {
real<lower=0> s_a;
real<lower=0> s_a_2;
real<lower=0> s_y;
vector[N] e_y;
vector[N] est_county;
for (i in 1:N){
est_county[i] = a[county[i]];
}
e_y = y - y_hat;
s_a = sd(a);// the sd of 85 values
s_a_2=sd(est_county);// the sd of 919 vlaues
s_y = sd(e_y);// the sd of 919 vlaues
```