Help with reparameterization

Daniel_Hocking · September 2, 2018, 2:54pm

I am trying to run an N-mixture type model with a different parameterization than I’ve used before. The model runs and the means are reasonable but the credible intervals are enormous. I’m not sure if it’s parameterization or putting things in the wrong blocks for sampling but I suspect something with the model formulation. Any help or suggestions would be appreciated. I’m still very new to stan.

Data generation in R

# Simulate N-mixture Data
n_sites <- 100 # Number of sites
n_visits <- 4 # Number of visits per site

x1 <- rnorm(n_sites, 0, 1) # covariate on normalized scale
beta0 <- log(10) # mean abundance on log scale
beta1 <- 1.2 # effect of covariate 1

lambda <- exp(beta0 + beta1 * x1)

p <- 0.3 # static detection probability

# True abundance
N <- rep(NA, times = length(lambda))
for(i in 1:n_sites) {
  N[i] <- rpois(1, lambda = lambda[i])
}

# Count process
counts <- matrix(NA, n_sites, n_visits)
for(i in 1:n_sites) {
  for(j in 1:n_visits) {
    counts[i, j] <- rbinom(1, N[i], p)
  }
}

Stan code

// Binomial mixture model with covariates
data {
  int<lower=0> R;       // Number of sites
  int<lower=0> T;       // Number of temporal replications
  int<lower=0> y[R, T]; // Counts
  vector[R] X;          // Covariate
}

parameters {
  real beta0; // intercept (log mean abundance)
  real beta1; // slope parameter
  real<lower=0, upper=1> p; // detection probability
}

transformed parameters {
  vector[R] log_lambda; // Log expected count
  vector[R] log_mu; // Log expected population size
  real log_p; // Log detection probability
  vector[R] lambda; // expected count intensity
  vector[R] mu; // Expected population size (rate parameter)

  log_mu = beta0 + beta1 * X;
  log_p = log(p);
  log_lambda = log_mu + log_p;
  mu = exp(log_mu);
  lambda = exp(log_lambda);
}

model {
  // Priors
  // beta0 ~ normal(0, 10);
  // beta1 ~ normal(0, 10);
  beta0 ~ uniform(-10, 10);
  beta1 ~ uniform(-10, 10);
  p ~ uniform(0, 1);

  // Likelihood
  for (i in 1:R) {
    for(j in 1:T) {
      y[i, j] ~ poisson(lambda[i]);
      // y[i, j] ~ poisson_log(log_lambda[i]);
      // target += poisson_log_lpmf(y[i, j] | log_lambda);
  }
  }
  
}

generated quantities {
  int N[R];
  int totalN;

  for (i in 1:R)
    // N[i] = poisson_rng(mu[i]);
    N[i] = poisson_log_rng(log_mu[i]);
  totalN = sum(N);
}

Code to run via rstan

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
set.seed(123)

# load data
source("Code/Stan_Models/nmix_constant_p_data.R")

## Parameters monitored
params <- c("beta0", "beta1", "p", "N")

## MCMC settings
ni <- 20000
nt <- 10
nb <- 10000
nc <- 3

data_list <- list(counts = counts,
  x1 = x1,
  n_sites = nrow(counts),
  n_visits = ncol(counts)
)

Nst <- apply(counts, 1, max) + 1   # Important to give good inits for latent N

## Initial values
inits <- lapply(1:nc, function(i)
  list(N = Nst, 
       beta0 = runif(1, -1, 1),
       beta1 = runif(1, -1, 1),
       p = runif(1, 0.2, 0.8)))

## Call Stan from R
# if(!dir.exists("Results/Stan")) dir.create("Results/Stan", recursive = TRUE)
out <- stan("Code/Stan_Models/nmix_alt.stan",
            data = list(y = counts, R = n_sites, T = n_visits, X = x1),
            init = inits,
            pars = params,
            chains = nc, iter = ni, warmup = nb, thin = nt,
            seed = 123,
            open_progress = FALSE,
            control=list(adapt_delta=0.9))

print(out, digits = 3)
plot(out, par = c("beta0", "beta1", "p"))
traceplot(out, par = c("beta0", "beta1", "p"))

bbbales2 · September 2, 2018, 5:36pm

You’re generating data with: counts[i, j] <- rbinom(1, N[i], p)

But fitting with y[i, j] ~ poisson(lambda[i]);

That seems like an issue.

mbjoseph · September 2, 2018, 6:06pm

N-mixture models are a bit tricky in Stan because latent abundance (N) is unbounded (e.g., see discussion here: https://groups.google.com/forum/#!searchin/stan-users/n-mixture|sort:date/stan-users/9mMsp1oB69g/78vW13OmE18J).

It is possible to code these up either by choosing some maximum abundance and enumerating over all possible N, or by using a multinomial Poisson parameterization. Some examples of both of these can be found here: https://github.com/stan-dev/example-models/tree/master/BPA/Ch.12

Daniel_Hocking · September 2, 2018, 8:07pm

Yeah sorry I should have explained better in my question. I knew the data generation wasn’t going to match the stan model but I wanted to see if

count \sim Poisson(\mu * p)

N = poisson(\mu)

would approximate the standard

count \sim Bin(N, p)

N \sim Poisson(mu)

I also wasn’t careful about my constant use of \mu and \lambda in the generating code vs sampling code.

Daniel_Hocking · September 2, 2018, 8:15pm

I have marginalized out N in the past by summing over values up to K (max remotely possible N) but have found it’s very slow as opposed to using a Gibbs sampler to sample N directly.

However, I was thinking that the alternative parameterization of multiplying \mu * p might work. Since people don’t do that I figured there was probably a reason but I couldn’t think of it offhand so I decided to simulate and see what happens. But then since I’m new to stan, I couldn’t tell if it was something about my code or just the parameterization that was the issue.

I’ve never looked into the multinomial Poisson, so I’ll check that option out. Thanks for the link.

mbjoseph · September 2, 2018, 8:43pm

It’s been a while since I’ve thought about N-mixture models, but IIRC in a typical binomial-Poisson hierarchy only the product \lambda p is identified, not \lambda and p separately. The tricky bit with N-mixture models is the repeat surveys that provide information that can inform p and N separately. If we have repeat surveys t=1, ..., T, then the likelihood for repeat counts y_1, ..., y_T (at one site) is

\prod_{t=1}^T \text{Binomial}(y_t \mid N, p) \text{Poisson}(N \mid \lambda])

and in that case I don’t know of a closed form marginalization over N.

my_rstanarm_username · October 18, 2019, 3:53pm

It looks like this thread is a little old, but in case someone else finds it useful, there are some closed-form methods for estimating N and p separately.

Use a multivariate poisson model (Dennis et al. 2015, DOI: 10.1111/biom.12246)
Reparameterize the likelihood of the binomial*poisson mixture following Haines 2016, DOI: 10.1111/biom.12521)
Use distance sampling, which uses additional data to estimate p, so that you can use the count ~ poisson(mu * p) in the likelihood.

Topic		Replies	Views
Help with Bernoulli sampling in model block Modeling rstan	4	132	July 5, 2024
Specifying a multinomial N-mixture model for capture-recapture encounter histories Modeling rstan , specification , capture-recapture	4	101	October 22, 2024
Estimating parameters of two normal distribution, mixed at proportions Modeling	5	1387	May 10, 2019
Convergence Troubles and Pathologies in an already Reparameterized Model Modeling fitting-issues , specification , performance , cmdstanr , hierarchical-model	5	549	July 27, 2023
Calculating the likelihood for loo in a binomial mixture model Modeling specification , loo	4	1911	July 18, 2018

Help with reparameterization

Related topics