That R list discussion got off to a vague start with
Anyone knows how to generate a vector of Normal distributed values (for example N(0,0.5)), but with a sum-to-zero constraint??
Did they want the marginals to be normal(0, 0.5)? Someone should have clarified. You wind up with correlation by definition.
When you write the following Stan program:
parameters {
vector[3] mu_prefix;
}
transformed parameters {
vector[4] mu = append_row(mu_prefix, -sum(mu_prefix));
}
model {
mu ~ normal(0, 1);
}
you make the distribution on the constrained (sum to zero) space proportional to this product of four normal densities, but it’s really only over the first three variables (the fourth is defined by the first three). The result isn’t marginally standard normal, but a bit more constrained:
mean se_mean sd
mu[1] 0.03 0.02 0.86
mu[2] 0.01 0.02 0.89
mu[3] -0.03 0.02 0.87
mu[4] 0.00 0.01 0.86
If you just put the distribution on the first three components, then they’re marginally standard normal and the fourth has a much wider distribution:
mean se_mean sd
mu[1] -0.02 0.02 1.00
mu[2] 0.00 0.02 0.98
mu[3] 0.00 0.02 1.01
mu[4] 0.02 0.03 1.75
To get everything to be marginally standard normal, you’d have to work out what the scale needs to be in the sampling statement to get the marginals to work out. Real statisticians know how to solve these things, but I’d get stuck after laying out the integrals.
I went through this kind of reasoning before with ordering, where you get the same thing by taking an ordered variable and giving it a normal distribution as you do by taking independent normals and sorting.