# Unsure what the prior is doing!

I am trying to implement a stan model presented in “Statistical Rethinking” in Section 14.2, pg 438, R code 14.13. I’ve pasted the example code below:

``````model_code <- '
data{
int N;
int nc_num_missing;
vector[N] kcal;
real neocortex[N];
vector[N] logmass;
int nc_missing[N];
}
parameters{
real alpha;
real<lower=0> sigma;
real bN;
real bM;
vector[nc_num_missing] nc_impute;
real mu_nc;
real<lower=0> sigma_nc;
}
model{
vector[N] mu;
vector[N] nc_merged;
alpha ~ normal(0,10);
bN ~ normal(0,10);
bM ~ normal(0,10);
mu_nc ~ normal(0.5,1);
sigma ~ cauchy(0,1);
sigma_nc ~ cauchy(0,1);
// merge missing and observed
for ( i in 1:N ) {
nc_merged[i] <- neocortex[i];
if ( nc_missing[i] > 0 ) nc_merged[i] <- nc_impute[nc_missing[i]];
}
// imputation
nc_merged ~ normal( mu_nc , sigma_nc );
// regression
mu <- alpha + bN*nc_merged + bM*logmass;
kcal ~ normal( mu , sigma );
}'
``````

My question specifically pertains to the instantiation of `nc_merged` and how the prior is assigned. My understanding is that `mc_merged` is an object which stores data points from `neocortex` if they exist, and then assigns an entry from `nc_inpute` to those entries. Thus, my guess is that the next line assigns a prior over the terms in `nc_merged` which are empty, and does nothing to the entries whose fields are present.

If I were correct, I would assume the following model definition would give identical results:

``````model{
vector[N] mu;
vector[N] nc_merged;
alpha ~ normal(0,10);
bN ~ normal(0,10);
bM ~ normal(0,10);
mu_nc ~ normal(0.5,1);
sigma ~ cauchy(0,1);
sigma_nc ~ cauchy(0,1);

// imputation
nc_impute ~ normal( mu_nc , sigma_nc );

// merge missing and observed
for ( i in 1:N ) {
nc_merged[i] <- neocortex[i];
if ( nc_missing[i] > 0 ) nc_merged[i] <- nc_impute[nc_missing[i]];
}

// regression
mu <- alpha + bN * nc_merged + bM * logmass;
kcal ~ normal( mu , sigma );
}'
``````

where I assign a prior to `nc_impute` before constructing the `nc_merged` vector. However, the two models give me different inference results, and so I presume I am misunderstanding what the example in Statistical Rethinking is doing.

Any help clarifying this discrepancy would be much appreciated!

The “entries whose fields are present”, eg. observed data together with their priors estimate mu_nc, and sigma_nc. In the second model, these parameters are determined exclusively by the priors.

Ah, so would it be correct to say that the posterior values of `mu_nc` and `sigma_nc` which are calculated on the observed values of `neocortex` are used to impute the missing values in `nc_inpute`?

So in a pseudocode workflow it’d be something like:

``````neocortex ~ normal( mu_nc, sigma_nc);
nc_inpute ~ normal( mu_nc, sigma_nc);
``````

where the first line specifies the likelihood between neocortex and the params `mu_nc, sigma_nc`, and the second line conducts the imputation?

Since nc_impute is a parameter, `nc_inpute ~ normal( mu_nc, sigma_nc);`
creates a sample using the specified piors. And therefore it can be used for
imputation.

Take a look at the User manual:
https://mc-stan.org/docs/stan-users-guide/missing-data.html#missing-data

2 Likes