Posterior predictive simulation by subgroup

stan_beginer · November 16, 2020, 4:10pm

Hi,

Recently I am performing posterior predictive simulation for certain subgroups of the orginal data but encountered some coding inefficacy problems. Suppose we have a hierarchical logistic regression model for 5 districts, and we would like to do posterior predictive simulation for each district separately. Since I don’t know whether Stan has any data structure like list in R (I am a R user), my current strategy is to split the original data into five pieces with corresponding predictors (for example, in total I have n data points and the predictor X is of length n, but now I split the data into n1,x1,…,n5,x5 and put them as separate inputs in the ‘data’ chunk, then define five y_pred vectors with length n1, n2, n3, n4 and n5) and do posterior predictive simulation separately for each of the five y_pred.

However, the above method is quite inefficient and messy if I have more subgroups and I am wondering whether Stan has a more efficient way to accomplish this goal?

Thx!

bbbales2 · November 18, 2020, 12:42pm

The way to do this in Stan is lump the data from all five things into one vector and then include an array of integers indicating which group the elements of the vector correspond to.

So you might original have:

vector[2] v1 = [ 1.0, 2.0 ]';
vector[2] v2 = [ 3.0, 4.0 ]';

And you can recode that like:

vector[4] v12 = [ 1.0, 2.0, 3.0, 4.0 ]';
int group[4] = { 1, 1, 2, 2 };

So like the first two elements of v12 belong to group 1 and the next two to group 2.

You can also do a ragged array sorta encoding, where again you store things in one big vector but instead of having an array of which value belongs to which thing you write down the start position of each group (this requires that your values be organized in groups and not scattered randomly).

Example here: Binomial_lpmf: Probability parameter[1] is 1, but must be in the interval [0, 1] - #12 by bbbales2 , and docs here: 8.2 Ragged Data Structures | Stan User’s Guide

stan_beginer · November 19, 2020, 5:03am

Thanks a lot！

Topic		Replies	Views
Posterior predictive simulation and summary by group in generated quantities block Modeling rstan , specification	5	396	October 12, 2023
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2839	June 12, 2020
Bayesian Neural Network with Categorical Likelihood: efficient implementation Modeling	5	720	March 16, 2021
Specification of multivariate hierarchical priors Modeling specification	1	515	June 9, 2019
Mismatch when predicting inside and outside of STAN Modeling posterior-predictive	4	816	May 20, 2020

Posterior predictive simulation by subgroup

Related topics