I’m trying to add coordinates and dimensions to an arviz inference.data object after sampling with cmdstanpy.
I need to do this in order to be able to subset the samples easily and feed them to arviz posterior plots and predictive checks.
I can successfully add one single dimension per group but no more than that. Here is some code to reproduce the problem. I’m using the eight school example: imagine that each school belongs to district and implements a certain curriculum, both indexed as either 1 or 2.
I want to be able to slice the inference.data object according to either district or curriculum.
Starting with code to run a model
from cmdstanpy import CmdStanModel
import numpy as np
import arviz as az
import pandas as pd
# Model specification and compilation
model_code = """
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real<lower=0> theta;
}
model {
theta ~ normal(0, 1);
y ~ normal(theta, 1);
}
generated quantities{
vector[N] log_lik;
vector[N] y_hat;
for (j in 1:N) {
log_lik[j] = normal_lpdf(y[j] | theta, 1);
y_hat[j] = normal_rng(theta, 1);
}
}
"""
with open("./eight_school_check.stan", "w") as f:
print(model_code, file=f)
stan_file = "./eight_school_check.stan"
stan_model = CmdStanModel(
stan_file=stan_file,
cpp_options={'STAN_THREADS': 'TRUE'}
)
stan_model.compile()
# Data
eight_school_data = {
"N": 8,
"y": [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]
}
# Model fit
stan_fit = stan_model.sample(data=eight_school_data)
Then, if I try to convert the fitted results and add coordinates and dimensions
additional_indexes = {
"ind_district" : np.array([0, 0, 0, 1, 1, 1, 0, 1]),
"ind_curriculum" : np.array([0, 1, 0, 1, 0, 1, 0, 1]),
}
inference_data = az.from_cmdstanpy(
posterior=stan_fit,
posterior_predictive='y_hat',
observed_data={'y' : eight_school_data['y']},
log_likelihood='log_lik',
coords={
'district': additional_indexes['ind_district'],
'curriculum': additional_indexes['ind_curriculum'],
},
dims={
'y' : ['district', 'curriculum'],
'y_hat' : ['district', 'curriculum'],
'log_lik' : ['district', 'curriculum'],
}
)
which gives me the following error
different number of dimensions on data and dims: 3 vs 4
However, I am able to add one of the two dimensions using:
inference_data = az.from_cmdstanpy(
posterior=stan_fit,
posterior_predictive='y_hat',
observed_data={'y' : eight_school_data['y']},
log_likelihood='log_lik',
coords={
'district': additional_indexes['ind_district'],
'curriculum': additional_indexes['ind_curriculum'],
},
dims={
'y' : ['district'],
'y_hat' : ['curriculum'],
'log_lik' : ['district'],
}
)
this runs with no error, and I can successfully run commands like
inference_data.sel(distict=[1]) # See coords under posterior_predictive
inference_data.sel(curriculum=[1]) # See coords under log_likelihood
again, ideally I want to be able to run a command such as
inference_data.sel(curriculum=[1], district=[2])
I hope I was able to explain myself clearly enough. Thanks a lot in advance!