I’d like to infer the mean and standard deviation for four separate groups; these groups are partitioned by demand and experiment cell (eg test-high, test-low, ctrl-high, ctrl-low.)
In the likelihood declaration in my model block, I believe that there should be a single likelihood declaration rather than four separate declarations. To accomplish this change, I would need to consolidate every vector in the data section into a single vector then use indices to keep track of which data relate to which parameters. (This is the part that I need help with.)
How should i accomplish this effect?
normal_model = """
data {
int<lower=1> n_test_high;
int<lower=1> n_test_low;
int<lower=1> n_ctrl_high;
int<lower=1> n_ctrl_low;
vector<lower=0>[n_test_high] X_test_high;
vector<lower=0>[n_test_low] X_test_low;
vector<lower=0>[n_ctrl_high] X_ctrl_high;
vector<lower=0>[n_ctrl_low] X_ctrl_low;
}
parameters {
real mu_test_high;
real<lower=0> sigma_test_high;
real mu_test_low;
real<lower=0> sigma_test_low;
real mu_ctrl_high;
real<lower=0> sigma_ctrl_high;
real mu_ctrl_low;
real<lower=0> sigma_ctrl_low;
}
transformed_parameters {
real test_ratio = mu_test_high / mu_test_low;
real ctrl_ratio = mu_ctrl_high / mu_ctrl_low;
real delta_ratio = (mu_test_high / mu_test_low) - (mu_ctrl_high / mu_ctrl_low);
}
model {
// Priors
mu_test_high ~ normal(0, sqrt(1000));
sigma_test_high ~inv_gamma(3,2);
mu_test_low ~ normal(0, sqrt(1000));
sigma_test_low ~inv_gamma(3,2);
mu_ctrl_high ~ normal(0, sqrt(1000));
sigma_ctrl_high ~inv_gamma(3,2);
mu_ctrl_low ~ normal(0, sqrt(1000));
sigma_ctrl_low ~inv_gamma(3,2);
// Likelihood
X_test_high ~ normal(mu_test_high, sigma_test_high);
X_test_low ~ normal(mu_test_low, sigma_test_low);
X_ctrl_high ~ normal(mu_ctrl_high, sigma_ctrl_high);
X_test_low ~ normal(mu_ctrl_low, sigma_ctrl_low);
}
"""
1 Like
So the general idea is that you want to go from:
Group 1 |
Group 2 |
Group 3 |
Group 4 |
y_1 |
y_3 |
y_5 |
y_7 |
y_2 |
y_4 |
y_6 |
y_8 |
To:
y |
Group |
y_1 |
1 |
y_2 |
1 |
y_3 |
2 |
y_4 |
2 |
y_5 |
3 |
y_6 |
3 |
y_7 |
4 |
y_8 |
4 |
With this data structure, your Stan model then becomes:
data {
int<lower=1> N;
int<lower=1> N_groups;
vector<lower=0>[N] y;
array[N] int group;
}
parameters {
vector[N_groups] mu;
vector<lower=0>[N_groups] sigma;
}
model {
mu ~ normal(0, sqrt(1000));
sigma ~ inv_gamma(3,2);
y ~ normal(mu[group], sigma[group]);
}
2 Likes
Hi, thank you for the response! If I understand the below correctly:
-
y
and group
are vectors/arrays of the same length and capture the corresponding value and group assignment of each given y_i, respectively.
- The statement
mu ~normal(0, sqrt(1000))
applies the same sampling statement to each element in the vector mu
but does not use pooling.
Am I correct in 1 & 2 above?
Cheers!
Yep, correct on both counts
1 Like
Encountered the below error, but the swap was simple int group[N];
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in '' at line 6, column 2
-------------------------------------------------
4: int<lower=1> N_groups;
5: vector<lower=0>[N] y;
6: array[N] int group;
^
7: }
-------------------------------------------------
PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or '}' to close variable declarations>
Ah, I’m guessing you’re using rstan
then? The new array
syntax isn’t yet available in the version of stan in rstan
These two versions are equivalent, right?
I often end up in the first situation with multiple sampling statements involving the data: e.g., using multiple data sets to inform the likelihood. I’ve played around with “stacking” the data like this but, in my case, it often seems like more work and also somewhat less clear to read.
I don’t mean that this model is less clear either way. I just want to double check my understanding for my own purposes.
Yep the two versions represent identical models. The second (“stacked”) version will (generally) be more efficient since a single vectorised likelihood can be applied to all observations
1 Like
Thanks! That makes sense. I think my cases often involve different densities (if that’s the word) for different data sets so I’d need to break up that loop anyway. But I can see how if it was uniform enough among the groups, stacking would be an efficiency win.
1 Like
PyStan-- but it probably functions much the same as Rstan :)