Hierarchical Modeling with Aggregated Data


I am attempting to build a hierarchical model in STAN. My hierarchy is national → businesses → states → locations. I have data at the location-day level and it seems impossible for my code to run (for a number of reasons I’m sure). I’m been playing around with the idea of aggregating my data up to the state level and weighting my regression based on the number of locations in those states. I want to be able to leverage the data on locations but also have code that runs quickly. Do we think there is a better way to accomplish what I’ve set out to do in the code below?

data {
    int N_obs; // Number of observations
    int nG; // Number of States
    int nR; // Number of businesses
    array[N_obs] int G; // State ID for each observation
    array[N_obs] int R; // Business ID for each observation
    int nX_direct; // Number of constrained predictors
    int nX_unconstrained; // Number of unconstrained predictors
    matrix[N_obs, nX_direct] x_con; // Constrained predictors matrix
    matrix[N_obs, nX_unconstrained] x_uncon; // Unconstrained predictors matrix
    array[N_obs] real y; // Dependent variable vector
    vector[N_obs] weights; // Weights for each observation - number of locations in a state!

parameters {
    array[nR] vector[nX_unconstrained] beta_r_uncon; // Unconstrained coefficients at business level
    array[nG] vector<lower=0>[nX_direct] beta_r_con; // Constrained coefficients at business level
    array[nG] vector[nX_unconstrained] beta_g_uncon; // Unconstrained coefficients at state level
    array[nG] vector<lower=0>[nX_direct] beta_g_con; // Constrained coefficients at state level

    vector<lower=0>[nX_direct] sigma_g_con; // SDs for state-specific constrained coefficients
    vector<lower=0>[nX_unconstrained] sigma_g_uncon; // SDs for state-specific unconstrained coefficients

    cholesky_factor_corr[nX_direct] L_r_con; // Cholesky factor of correlation matrix for business constrained coefficients
    cholesky_factor_corr[nX_unconstrained] L_r_uncon; // Cholesky factor for business unconstrained coefficients

    real<lower=0> sigma; // Population-level SD for sales

transformed parameters {
    vector[N_obs] mu;
    for (i in 1:N_obs) {
        mu[i] = dot_product(x_con_[i], beta_g_con[G[i]]) + dot_product(x_uncon[i], beta_g_uncon[G[i]]);

model {
    // Priors
    for (r in 1:nR) {
        beta_r_con[r] ~ normal(0, 1); // Retailer-level constrained predictors
        beta_r_uncon[r] ~ normal(0, 1); // Retailer-level unconstrained predictors
    L_r_con ~ lkj_corr_cholesky(2.0);
    L_r_uncon ~ lkj_corr_cholesky(2.0);

    sigma_g_con ~ exponential(1);
    sigma_g_uncon ~ exponential(1);
    sigma ~ exponential(1);

    // stat-level effects conditional on retailer effects
    for (g in 1:nG) {
        beta_g_con[g] ~ multi_normal_cholesky(beta_r_con[R[g]], diag_pre_multiply(sigma_g_con, L_r_con));
        beta_g_uncon[g] ~ normal(beta_r_uncon[R[g]], sigma_g_uncon); // Assuming independence across predictors for simplicity

    // Readjusting by weights
    for (i in 1:N_obs) {
        target += weights[i] * normal_lpdf(y[i] | mu[i], sigma / sqrt(weights[i]));