Auto-logistic occupancy model in stan: how to marginalize

Wang_MinJie · August 17, 2021, 4:50am

This blog post [So, you don't have enough data to fit a dynamic occupancy model? An introduction to auto-logistic occupancy models. – Mason Fidino – Quantitative Ecologist] gives the dynamic occupancy model, which is written in JAGS.

model{
  for(site in 1:nsite){
    # This is for the first year
    #
    # latent state model
    #
    logit(psi[site,1]) <- inprod(psi_beta, X[site,])
    z[site,1] ~ dbern(psi[site,1])
    #
    # data model
    #
    logit(rho[site,1]) <- inprod(rho_beta, X[site,])
    y[site,1] ~ dbin(rho[site,1] * z[site,1], J)
    #
    # For remaining years of sampling
    #
    for(year in 2:nyear){
      #
      # latent state model, has theta term now.
      #
      logit(psi[site,year]) <- inprod(psi_beta, X[site,]) +
        theta * z[site,year-1]
      z[site,year] ~ dbern(psi[site,year])
      #
      # data model
      #
      logit(rho[site,year]) <- inprod(rho_beta, X[site,])
      y[site,year] ~ dbin(rho[site,year] * z[site,year], J)
    }
  }
  #
  # Priors
  #
  # Intercept and slope terms
  for(covar in 1:ncovar){
    psi_beta[covar] ~ dlogis(0,1)
    rho_beta[covar] ~ dlogis(0,1)
  }
  # First-order autoregressive term
  theta ~ dlogis(0,1)
}

As a fan of Stan, I want to convert it into Stan code, but I got an empty result.

data from blog post

# Step 1. Simulate the data.

# General bookkeeping
nsite    <- 25
nyear    <- 8
ncovar   <- 3
nrepeats <- 5

# for covariates
X <- matrix(
  NA,
  ncol = ncovar,
  nrow = nsite
)

set.seed(333)
# Create covariates
X <- cbind(1, apply(X, 2, function(x) rnorm(nsite)))  # 2 indicates columns

# Occupancy coefficients, +1 for intercept
psi_betas <- rnorm(ncovar + 1)

# auto-logistic term
theta <- 0.75

# Detection coefficients, decreasing magnitude here
rho_betas <- rnorm(ncovar + 1, 0, 0.5)

# latent state, give same dimensions as X
z <- matrix(NA, ncol = nyear, nrow = nsite)

# Do first year occupancy
psi <- plogis(X %*% psi_betas)
z[,1] <- rbinom(nsite, 1, psi)

# And then the rest, which also uses the theta term
for (year in 2:nyear) {
  psi <- plogis(X %*% psi_betas + theta * z[, year - 1])
  z[,year] <- rbinom(nsite,1,psi)
}

# Add imperfect detection, make it a matrix with
#  the same dimensions as z. Then multiply by z.
rho <- matrix(
  plogis(X %*% rho_betas),
  ncol = nyear,
  nrow = nsite
) * z

# Create the observed data. Again, same dimensions as z.
y <- matrix(
  rbinom(
    length(rho),
    nrepeats,
    rho
  ),
  ncol = nyear,
  nrow = nsite
)

The Stan code I converted

data{
    int J;              
    int nsite;             
    int nyear; 
    int ncovar;
    matrix[nsite, ncovar] X;
    int y[nsite, nyear];    
}
parameters{
    real theta; 
    vector[ncovar] psi_beta;
    vector[ncovar] rho_beta;
}
model{
   matrix[nsite, nyear] psi;
   matrix[nsite, nyear] rho;
   int z[nsite, nyear];
  
  for(site in 1:nsite){

    psi[site, 1] = inv_logit(dot_product(psi_beta, X[site, ]));  
    z[site, 1] ~ bernoulli(psi[site, 1]);

    rho[site, 1] = inv_logit(dot_product(rho_beta, X[site, ]));
    y[site, 1] ~ binomial(J, rho[site, 1] * z[site, 1]);


    for(year in 2:nyear){

      psi[site, year] = inv_logit(dot_product(psi_beta, X[site, ]) + theta * z[site, year-1]);
      z[site, year] ~ bernoulli(psi[site, year]);

      rho[site, year] = inv_logit(dot_product(rho_beta, X[site, ]));
      y[site, year] ~ binomial(J, rho[site, year] * z[site, year]);
    }
  }


  psi_beta ~ logistic(0, 1);
  rho_beta ~ logistic(0, 1);
  theta ~ logistic(0, 1);
  
}

However, the results does not give samples

data_list <- list(
  J      = nrepeats,
  nsite  = nsite,
  nyear  = nyear,
  ncovar = ncovar + 1, 
  X      = X,
  y      = y
)

m <- stan(model_code = stan_program, data = data_list,
          chains = 4,
          iter = 20000,
          warmup = 10000
          )
m

# Stan model 'anon_model' does not contain samples.

I don’t know where the code is wrong. Can you help me?

jsocolar · August 17, 2021, 2:34pm

Stan does not allow discrete quantities to show up on the right-hand side of sampling statements. That is, integers must not be treated as parameters. You have integer z on the right-hand side, and this won’t work. In general, the solution is to marginalize over the discrete parameters. For static occupancy models, this is fairly straightforward. See here:
https://jsocolar.github.io/marginalizedOccupancy/

In dynamic occupancy models, and I think in this flavor of autoregressive model as well, the marginalization is way more complicated, because I think you need to marginalize over every possible sequence of states for z at every point. With a limited number of seasons/years per point, this should be feasible but will require some careful bookkeeping. But with longer time-series, it quickly gets very cumbersome, and eventually the factorial complexity will overwhelm you. The number of states that you need to marginalize for each point will be 2^n, where n is the number of season/years during which there were no observed detections at that point.

It’s possible that there is some kind of statistical trick or bookkeeping wizardry that makes this flavor of autologistic occupancy model feasible in Stan, but I don’t know what it is. What’s the maximum number of seasons/years that you have per point in your data? And what is the maximum number of seasons/years without a detection that you have per point in your data? If these numbers are sufficiently small, then I’m fairly confident that we can slog through the marginalization by brute force.

jsocolar · August 17, 2021, 3:01pm

By the way, I’ve edited the title of the post to hopefully get more of the right people’s eyes on it.

mbjoseph · August 17, 2021, 3:37pm

For dynamic occupancy models and (including the autologistic model), it is easiest to frame the problem as a hidden Markov model (computing the likelihood using the forward algorithm).

To do this for the autologistic model you’d want to write down:

the initial state probability vector
the 2x2 state transition probability matrix where the rows correspond to z_{t-1} and the columns correspond to z_t.
the 2x2 “emission” probability matrix where the rows correspond to z_t and the columns correspond to y_t

For an example check out example-models/Dynocc.stan at master · stan-dev/example-models · GitHub

Topic		Replies	Views
Functions inside occupancy model Modeling ecology , cmdstanr	2	466	August 16, 2022
Dynamic Logistic Model for Network Formation [Part1. Problems with data types] Modeling rstan , specification	3	510	May 15, 2020
[Help] Hierarchical logistic regression with time-varying coefficients Modeling	6	781	October 7, 2021
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2603	June 12, 2020
Efficient implementation of the for loop? Modeling rstan , fitting-issues	1	543	February 18, 2022

Auto-logistic occupancy model in stan: how to marginalize

Related Topics