I’d like to infer the mean and standard deviation for four separate groups; these groups are partitioned by demand and experiment cell (eg test-high, test-low, ctrl-high, ctrl-low.)
In the likelihood declaration in my model block, I believe that there should be a single likelihood declaration rather than four separate declarations. To accomplish this change, I would need to consolidate every vector in the data section into a single vector then use indices to keep track of which data relate to which parameters. (This is the part that I need help with.)
How should i accomplish this effect?
normal_model = """
real test_ratio = mu_test_high / mu_test_low;
real ctrl_ratio = mu_ctrl_high / mu_ctrl_low;
real delta_ratio = (mu_test_high / mu_test_low) - (mu_ctrl_high / mu_ctrl_low);
mu_test_high ~ normal(0, sqrt(1000));
mu_test_low ~ normal(0, sqrt(1000));
mu_ctrl_high ~ normal(0, sqrt(1000));
mu_ctrl_low ~ normal(0, sqrt(1000));
X_test_high ~ normal(mu_test_high, sigma_test_high);
X_test_low ~ normal(mu_test_low, sigma_test_low);
X_ctrl_high ~ normal(mu_ctrl_high, sigma_ctrl_high);
X_test_low ~ normal(mu_ctrl_low, sigma_ctrl_low);
So the general idea is that you want to go from:
With this data structure, your Stan model then becomes:
array[N] int group;
mu ~ normal(0, sqrt(1000));
sigma ~ inv_gamma(3,2);
y ~ normal(mu[group], sigma[group]);
Hi, thank you for the response! If I understand the below correctly:
group are vectors/arrays of the same length and capture the corresponding value and group assignment of each given y_i, respectively.
- The statement
mu ~normal(0, sqrt(1000)) applies the same sampling statement to each element in the vector
mu but does not use pooling.
Am I correct in 1 & 2 above?
Yep, correct on both counts
Encountered the below error, but the swap was simple
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in '' at line 6, column 2
4: int<lower=1> N_groups;
5: vector<lower=0>[N] y;
6: array[N] int group;
PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
or '}' to close variable declarations>
Ah, I’m guessing you’re using
rstan then? The new
array syntax isn’t yet available in the version of stan in
These two versions are equivalent, right?
I often end up in the first situation with multiple sampling statements involving the data: e.g., using multiple data sets to inform the likelihood. I’ve played around with “stacking” the data like this but, in my case, it often seems like more work and also somewhat less clear to read.
I don’t mean that this model is less clear either way. I just want to double check my understanding for my own purposes.
Yep the two versions represent identical models. The second (“stacked”) version will (generally) be more efficient since a single vectorised likelihood can be applied to all observations
Thanks! That makes sense. I think my cases often involve different densities (if that’s the word) for different data sets so I’d need to break up that loop anyway. But I can see how if it was uniform enough among the groups, stacking would be an efficiency win.
PyStan-- but it probably functions much the same as Rstan :)