Building a Binomial Model on Sparse Survey Data

Hello all, first post here and relatively new to Stan. I’ve build a few models using this framework now and this forum has been a fantastic resource, but I haven’t been able to find an analog to what I’m working on now…

I have built the following model which is designed to track “awareness” from survey data. The input data is the number of surveys completed in a time period counts and the number of respondents that have responded they are aware in the survey awares. In addition to varying in time, I have further separated the data into geographic regions for each survey respondent, so the data is sized T x G, where T is time and G is geographic regions.

The issue I am trying to solve with the data is that in some instances there are very few survey and/or very few respondents are “aware” (generally awareness is low in this use case). So, I am trying to turn to modeling this data as a hierarchical binomial framework, where the geo-level binomial “success/aware” probability is linked back to a global success as a prior which is itself time-varying as an AR(1) model. The whole model is supplied below:

data {
    int<lower=1> T;  // number of time-steps
    int<lower=1> G;  // number of geos
    int<lower=0> counts[T, G];  // number of survey responses 
    int<lower=0> awares[T, G];  // number of aware responses

parameters {
    matrix<lower=0, upper=1>[T, G] p_geo;  // probability in each geo of awareness

    // time variation inherited from p_global, with variance sigma
    real<lower=0> sigma;  // variance of p_geo to p_global
    real delta;  // represents an autoregressive coefficient on the previous beta
    real mu;

transformed parameters {
    vector<lower=0, upper=1>[T] p_global;  // time-varying global probability of awareness
    // give p_global a random walk element
    p_global[1] = 0.1;
    for (t in 2 : T) {
        p_global[t] = mu + p_global[t-1] * delta;  // AR(1) model

model {

    for (t in 1 : T) {
        p_geo[t, :] ~ normal(p_global[t], sigma);

        for (g in 1 : G) {
            //P_binomial[t,g] = choose(counts[t,g], awares[t,g]) * pow(p_geo[t,g], awares[t,g]) * pow(1-p_geo[t,g], counts[t,g]-awares[t,g])
            // Here we relate the the model awareness probability to the number of
            // aware individuals we see empirically
            awares[t,g] ~ binomial(counts[t,g], p_geo[t,g]);
    sigma ~ beta(3,3);
    delta ~ normal(0, 1);
    mu ~ normal(0, 0.5);

This specification seems to run fine, however, in my posterior draws I see plenty of samples in p_geo, but very few in p_global. I’m afraid there isn’t a lot of time variation in my raw signals, either, so I am thinking this may be due to autocorrelation in the aggregate data, however I am looking for advice. Mixing for these parameters, and a few others (sigma: Rhat ~= 1.5) is also unacceptably large.

Further, I only have about 1500 total time/geo survey data point samples to feed into the model, and I am afraid this structure might be overparameterized for this data set. It is possible I could eliminate the geo-specification of the model, which would reduce insight, but would also reduce complexity drastically. Is this advisable?

Thank you in reading this far, and thanks in advance for any advice you can offer.

It seems like you have some sort of hierarchical structure, where the group-level priors come from p_{global} (which in turn are constrained by hyperpriors on \mu and \delta). If you didn’t have that you’d be effectively fitting T times G independent binomials to your data, so maybe trying that is a good baseline to get an expectation for ESS and \hat{R}. That’s a good starting poin

If p_{geo}(t) \sim \mathcal{N}(p_{global}(t),\sigma) is good model (i.e. there’s reason to think the mean of p_{geo} is correlated in time, and the parameter itself is drawn form that normal at any given time point) it will reduce the total number of parameters you need and help to fit the whole data set, so if independent binomials don’t perform too poorly this shouldn’t in principle be overparameterized. If it’s not, however, the samples may go all over the place to try to fit parts of the data and uncertainty is going to increase and affect the convergence metrics.

Assuming independent binomials aren’t terrible and the model can improve on that, there’s a couple of things that may be worth looking at: \sigma ~ beta(a,b) restricts variance to 1 at most. Another potential issue is fixing p_global[1] = 0.1; if you have a good reason for that choice it will help, otherwise it will work against the model, so you may want to make into another parameter for the initial conditions and place a prior on it (e.g. beta(1,1) or something less vague ).

To be clear, I’m not an expert in AR models, so I don’t have a whole lot of intuition for your specific model, but I hope this helps to some extent.

Hi there, thank you for your reply and your suggestions. After posting this summary, I implemented a variation of your idea – instead of T times G independent binomials, I removed the AR(1) relationship but kept the hierarchical p_\textrm{geo}(t) \sim \mathcal{N}(p_\textrm{global}(t), \sigma). Thus the parameterization is approximately the same, but at each time t the global prior is allowed to float independent of its previous values. This model performed great in terms of convergence properties, but it does lack the smoothing characteristics I was attempting to capture with the time-series modeling of p_\textrm{global}.

I am not an expert in time-series modeling, but I am beginning to suspect that the problem lies in introducing the AR(1) relationship and modeling it against data with very little variation in the samples. For example, in my case I have 3-4 years of monthly survey data where the binomial success probability is consistently close to about 5% throughout the entire dataset (even across most geographic regions). In a hand-waving summary, I suspect my data set might contain too much colinearity and what variation there is is explained sufficiently by the binomial framework. I’m not sure if I have explained that well enough and I don’t know how to prove that to be the case, however. In the mean time I may run some experiments to introduce the time-series modeling back in, and I will attempt to address the issues you outlined regarding the initialization of p_global[1] and sigma. Thank you.


I am not an expert in time series, but what not trying a random walk? The t+1 probability would be equal to the t one + a value drawn from a normal distribution with mean zero and estimated sd.

It seems to me this would be an intermediate step between independant estimates and the AR (1).


Thanks! I like this idea too – I will give it a shot!

That may well be a better alternative than fully independent binomials.

With and AR model in theory the worst that could happen is there being very little autocorrelation and the being reflected in a small delta that would allow the mean to fluctuate a lot for successive time points, but that may also throw off the convergence, or alternatively the model may be too constrained and require the other parameters to vary too wildly. But hopefully between those alternatives and your implementation you should be able to find a suitable solution.