Building a Binomial Model on Sparse Survey Data

Hello all, first post here and relatively new to Stan. I’ve build a few models using this framework now and this forum has been a fantastic resource, but I haven’t been able to find an analog to what I’m working on now…

I have built the following model which is designed to track “awareness” from survey data. The input data is the number of surveys completed in a time period counts and the number of respondents that have responded they are aware in the survey awares. In addition to varying in time, I have further separated the data into geographic regions for each survey respondent, so the data is sized T x G, where T is time and G is geographic regions.

The issue I am trying to solve with the data is that in some instances there are very few survey and/or very few respondents are “aware” (generally awareness is low in this use case). So, I am trying to turn to modeling this data as a hierarchical binomial framework, where the geo-level binomial “success/aware” probability is linked back to a global success as a prior which is itself time-varying as an AR(1) model. The whole model is supplied below:

data {
    int<lower=1> T;  // number of time-steps
    int<lower=1> G;  // number of geos
    
    int<lower=0> counts[T, G];  // number of survey responses 
    int<lower=0> awares[T, G];  // number of aware responses
}

parameters {
    matrix<lower=0, upper=1>[T, G] p_geo;  // probability in each geo of awareness

    // time variation inherited from p_global, with variance sigma
    real<lower=0> sigma;  // variance of p_geo to p_global
    real delta;  // represents an autoregressive coefficient on the previous beta
    real mu;
}

transformed parameters {
    vector<lower=0, upper=1>[T] p_global;  // time-varying global probability of awareness
    // give p_global a random walk element
    p_global[1] = 0.1;
    
    for (t in 2 : T) {
        p_global[t] = mu + p_global[t-1] * delta;  // AR(1) model
    }
}

model {

    for (t in 1 : T) {
        p_geo[t, :] ~ normal(p_global[t], sigma);

        for (g in 1 : G) {
            //P_binomial[t,g] = choose(counts[t,g], awares[t,g]) * pow(p_geo[t,g], awares[t,g]) * pow(1-p_geo[t,g], counts[t,g]-awares[t,g])
            // Here we relate the the model awareness probability to the number of
            // aware individuals we see empirically
            awares[t,g] ~ binomial(counts[t,g], p_geo[t,g]);
        }
    }
    
    sigma ~ beta(3,3);
    delta ~ normal(0, 1);
    mu ~ normal(0, 0.5);
}

This specification seems to run fine, however, in my posterior draws I see plenty of samples in p_geo, but very few in p_global. I’m afraid there isn’t a lot of time variation in my raw signals, either, so I am thinking this may be due to autocorrelation in the aggregate data, however I am looking for advice. Mixing for these parameters, and a few others (sigma: Rhat ~= 1.5) is also unacceptably large.

Further, I only have about 1500 total time/geo survey data point samples to feed into the model, and I am afraid this structure might be overparameterized for this data set. It is possible I could eliminate the geo-specification of the model, which would reduce insight, but would also reduce complexity drastically. Is this advisable?

Thank you in reading this far, and thanks in advance for any advice you can offer.