Inv cloglog link (prior issue?) and sampling challenges (multi species occupancy model)


I’ve been struggling to get stan to sample the joint-posterior of a multi-species occupancy model where the state process follows a inverse cloglog link function and the detection process follows a logit link. The tl;dr question is: are there good rules of thumb for defining (log-scale) intercept/coefficient priors when the response of interest (and the domain knowledge) is on the probability scale?

Full backdrop is that this is intended [foolishly hoped?] to be one component of some multi-species integrated distribution model that has also exhibited severe issues; after breaking it apart piecemeal, the occupancy component seemed to be the source.

The issue manifests in standard–but very loud–ways: divergent transitions nearly every model iteration, and poor mixing by all diagnostic (n.eff’s/r-hats) or visual assessments. I’m semi-confident that the underlying cause is the link being used. After testing different combinations of centered/non-centered, adapt_delta, tweaks to hyper-priors, and link-functions, the big difference maker is that any model where variation in occupancy follows a logit-link samples just fine, and any model where it doesn’t complains loudly.

I’d like to use an inv cloglog approach because it eases integration with different germane data types, presence-background data primarily. I guess I’m suspecting or hoping that part of the issue relates to setting reasonable priors, or priors that correspond to domain knowledge on the probability scale–we’re really more interested in the probability of occupancy than E(N). The full code is below–note, I’d been using a normal prior on the mean log-intercept–N(-1.75, .75), partially because there’s an area offset in the model of log(4) and this combination roughly captures a range of plausible values. Have tried fiddling with the priors, but no luck so far. Is there is some other (left-skewed?) distribution that is more sensible, or any general rules of thumb for defining priors for these models?

Appreciate any thoughts!


data {
        int<lower=1> nSpec; 
        int<lower=1> nLocs;
        int<lower=1> nCellsOcc;
        int<lower=1> nCovs;
        int<lower=0, upper=1> ObsOcc[nCellsOcc, nSpec];
        int<lower=0> yBBBPA[nLocs, nSpec];
        matrix[nCellsOcc, nCovs] X3;
        int<lower=1> From[nCellsOcc];
        int<lower=1> To[nCellsOcc];
        int<lower=1> K[nLocs];
        real area;
      parameters {
        matrix[nCovs, nSpec] alpha; //point process intensity;
        vector[nSpec] beta; //p for Occ/Det
        vector[nCovs] mu_alpha;
        vector<lower=0>[nCovs] sig_alpha;
        real mu_beta;
        real<lower=0> sig_beta;
      transformed parameters {
        matrix<lower=0>[nCellsOcc, nSpec] mu;
        matrix<lower=0, upper=1>[nCellsOcc, nSpec] psi;
        vector<lower=0, upper=1>[nSpec] ppa; //p
          for (s in 1:nSpec){
            mu[ , s] = exp(X3*alpha[, s]+area);
        ppa = inv_logit(beta); 
      model {
        mu_beta ~ logistic(0, 1);
        sig_beta ~ normal(0, 1);
        mu_alpha[1]~normal(-1.75, .75);
        to_vector(mu_alpha[2:nCovs])~normal(0, 1);
        sig_alpha~normal(0, 1);
    	beta~normal(mu_beta, sig_beta);

          for (s in 1:nSpec){
              alpha[,s]~normal(mu_alpha, sig_alpha);
        for (s in 1:nSpec){
          for (i in 1:nCellsOcc) {
            if (ObsOcc[i,s] == 1) {
                target+=bernoulli_lpmf(1|psi[i,s])+binomial_lpmf(y[From[i]:To[i], s]|K[From[i]:To[i]], 
                target+=log_sum_exp(bernoulli_lpmf(1|psi[i,s])+binomial_lpmf(y[From[i]:To[i], s]|K[From[i]:To[i]], ppa[s]),
      generated quantities {
1 Like


1 Like

I suspect the issue isn’t with priors but rather with the cloglog link itself (but I’ll try to discuss priors below). The cloglog has a weird asymmetric shape and in particular the inverse cloglog rockets off to one very very quickly as the linear predictor grows positive. So if the linear predictor needs to fit some substantial variation in occupancy on the low end of the scale, it might also be forced to rocket off to one at some combinations of covariate values. This could lead to misspecification (in some models it could also be the correct specification, but the fact that logit is working for you but cloglog blows up suggests that’s not the case here).

With that said, the best way to choose priors in these regressions is to examine the implied pushforward distributions on the probability scale. In general, if you’re uncertain about coefficient values, you will get pushforwards that concentrate heavily towards probabilities of zero and one. My strategy is first to moan and despair about how annoying it is to set good priors on logit-scale (or cloglog scale) parameters, and then to set about tweaking things until the concentration towards zero and one is sufficiently reduced that I don’t feel the priors are implying something ridiculous. In many applications, I see concentration towards one as much less desirable than concentration towards zero, and so I often end up with a prior on the intercept that is not zero-centered, instead favoring negative values. And then I really try to tighten priors on coefficients and random-effect sds until they are only a little bit more permissive than domain knowledge. I think about domain knowledge in terms of what I think the maximum change in probability should be per unit predictor in a context where the occupancy probability is otherwise 0.5.

Note, however, that too-weak priors lead to divergences only in the context of too-weak data. Divergences depend on the shape of the posterior distribution, which ideally in data-rich settings will be dominated by the likelihood rather than the prior. So it’s quite possible that prior tweaking won’t help you. The fact that logit link works fine and cloglog blows up spectacularly to me suggests that cloglog is misspecified. This is an example of the “folk theorem of statistical computing”, i.e. that computational issues in a model are often the result of misspecification.

Edited to add:
If you want to double-check, I suggest you try simulating and then fitting data under the cloglog model. If you still get these divergences even when you know that the model is well specified, then I’m wrong (but I also wouldn’t have any great alternative suggestions about how to proceed).


Appreciate your thoughts, here, and think you’ve probably solved this. Without yet running through the simulations you suggested, some further tinkering suggests the cloglog is really choking for some of the more sparsely observed species (culling a bunch of species more or less gets rid of the problem both for the MS-occ component above or a quick approximation of something closer to the envisioned fuller model). Certainly seems like the icloglog is probably just not a very good shape here for at least some species in question, or perhaps trickier to fit with sparse data.

1 Like

My strong guess (but just a guess) is that the model has to really contort itself to avoid estimating a bunch of occupancy probabilities near 1 for the sparsely observed species, or else it has to contort itself to estimate vanishingly low detection probabilities for these same species. And this contortion is doing weird things to the posterior geometry. For example, you could be estimating random intercepts for detection that run off towards negative infinity for some species, or something like that.

Bear in mind that sparsely observed species aren’t “data-poor” per se, it’s just that the data strongly suggest that they are unlikely to be observed. Here I think that the problem is that these species are data-rich enough to strongly constrain that they must have either low occupancy or low detection, and the propensity for the cloglog to give them exceedingly high occupancy probabilities at some points is causing problems. Again, just a hunch.

1 Like