Modifying a cjs_mnl program to account for uncertainty in the initial numbers marked

I want to modify a simple cormack-jolly-seber (csj) multinomial model using an m-array of data consisting of one initial marking site/time (site1) and two subsequent recapture sites/occasions (site2, site3) to account for uncertainty in the number of individuals marked at site1 that can be expected to be bound for (and hence in-principle detectable conditional on their survival) site2 and beyond. The main interest is in the survival rate of marked individuals from initial marking at site1 to site2.

Consider the site of initial capture and marking (site1) to be a downstream river location with sites 2 and 3 located some distance further upstream. Remote sensing mark detectors are located at sites2 and 3. However, tributary rivers without detectors enter the main river between sites 1 and 2 and some unknown number of individuals marked at site1 are likely to be bound for these tributaries and so are in-principle undetectable at sites2 and 3.

Further, consider that a subsample of the individuals marked at site1 have non-lethal tissue samples taken before being released for genetic analysis that results in assigning each individual with high probability to destinations above sites 2 and 3 and below site2 (but above site1). It should be possible to use this data to estimate a probability distribution of the total number of individuals marked at site 1 that are in-principle detectable at site2 and/or 3. This will result in a Bayesian analysis in which the data for individuals marked at site1 and the residual number of individuals never detected at sites 2 or 3 are random variables, in addition to the survival and detection parameters (s1 and s2, and p1 and pr, respectively) that are the typical parameters of interest in the basic cjs model.

The data example consists of the following m array (in R.dump format in Rstan):

[,1]	[,2]	[,3]

[1,] 275 25 700
[2,] 0 150 125

Thus, a total of 1000 individuals are marked at site1 of which 300 are detected at sites 2 or 3 (275 at site2 plus 25 at site 3 that were not detected at site2). Of the 275 detected at site2, 150 are subsequently detected at site3 (in addition to the 25 not detected at site2). Of the total marked at site 1, 700 are never re-sighted, and of the total of 275 detected at site2 125 are never detected thereafter.

Suppose that 500 tissue samples for DNA assignment analysis were obtained (non-lethally) at site1 and that the results of the assignment test is that 300 are bound for destinations above sites2 and 3 and 200 are bound for destinations above site1 but below site2. This data can be used to parameterize a Beta (300, 200) distribution for the probability of that an individual randomly marked at site1 is bound for site2 and above (and therefore is in-principle detectable at sites 2 and/or3 conditional on survival to those sites).

One should be able to modify the basic cjs multinomial stan model by drawing variables Z ~ Beta(300, 200) and either adjusting the number marked at site1 that are never resighted by multiplying that number by the product of Z and the ratio of the number never detected (cell[1,3} of the m array) divided by the total marked at site1 (the sum of cells [1,1], [1,2], and [1,3]), or by adjusting the probability of individuals marked at site1 never being resighted (at sites 2 or 3) by this same product. In the example this would be Z*(700/1000).

Adopting the former approach and for random Z = 0.6, this would involve replacing the initial total number marked that are never subsequently detected (located in m-array cell [1,3] (700)) by the total number marked that are in-principle-detectable: 10000.60.7 = 1000*0.42 = 420. And of course this operation must be repeated n-iterations number of times together with the sampling of the survival (s) and detection probability (p) parameters.

The problem that I encounter when attempting to modify the .stan and associated.R files to implement this is that I do not know how to add Z ~ beta(300,200) to the .stan file. It appears that Stan will only accept this in the model code, not in the data, transformed data, parameters, or transformed parameters. But when I do this I receive an error and the .stan file will not compile.

I have implemented the described approach readily in a Bayes Fortran program that implements Metropolis-within-Gibbs sampling, sampling Z as beta random variable and multiplying each random Z by the total number marked at site 1 (as a floating point number), transforming the result to an integer to accomplish rounding, and then transforming the integer back to a floating point for calculation of the combinatorics of the multinomial likelihood (after tranforming to log gammas), and for calculating the adjusted number marked at site1 that are never detected at sites 2 or 3.

It should be possible to do so in stan using an m array. But I clearly do no understand how to do this.

A sample data file, .stan, and .R file for running the simple cjs model in Rstan are copied below. The .stan file is copied from Trevor Branch’s folder that includes Ito’s translation of the Kery&Schub’s Chapter 7 WinbBUGS code for the csj_mnl model into stan code. I would appreciate any suggestions of modifications to the attached .stan and .R files that would accomplish the desired modification.

Here are the files and a summary of a run in Rstan:

//cjs_mnl_test.stan
// Kery&Schub Chpt 7-10.2 Ito translation to stan code

data {
  int<lower=0> n_occasions;     // Number of capture occasions
//  int<lower=0> marr[n_occasions - 1, n_occasions]; // m-array
  int<lower=0> marr[n_occasions - 1, n_occasions]; // m-array  
}

transformed data {
  // Compoud declaration was enabled in Stan 2.13
  int n_occasions_minus_1 = n_occasions - 1;
  //  int n_occasions_minus_1;
  int r[n_occasions - 1];

  //  n_occasions_minus_1 = n_occasions - 1;

  // Calculate the number of birds released each year
  for (t in 1:n_occasions_minus_1)
    r[t] = sum(marr[t]);
}

parameters {
  vector<lower=0,upper=1>[n_occasions_minus_1] phi; // Survival
  vector<lower=0,upper=1>[n_occasions_minus_1] p;   // Recapture
}

transformed parameters {
  vector<lower=0,upper=1>[n_occasions_minus_1] q;
  simplex[n_occasions] pr[n_occasions_minus_1];

  q = 1.0 - p;             // Probability of non-recapture

  // Define the cell probabilities of the m-array
  for (t in 1:n_occasions_minus_1) {
    // Main diagonal
    pr[t, t] = phi[t] * p[t];

    // Above main diagonal
    for (j in (t + 1):n_occasions_minus_1)
      pr[t, j] = prod(phi[t:j]) * prod(q[t:(j - 1)]) * p[j];

    // Below main diagonal
    pr[t, :(t - 1)] = rep_vector(0, t - 1);
  }

  // Last column: probability of non-recapture
  for (t in 1:n_occasions_minus_1)
    pr[t, n_occasions] = 1 - sum(pr[t, :n_occasions_minus_1]);
}

model {
  // Priors
  // Uniform priors are implicitly defined.
  //  phi ~ uniform(0, 1);
  //  p ~ uniform(0, 1);

  // Define the multinomial likelihood
  for (t in 1:n_occasions_minus_1)
    marr[t] ~ multinomial(pr[t]);
}

#cjs_mnl_test.R
#Example data
#NG June 1 2018

7. Estimation of survival probabilities using capture-recapture data

7.10. Fitting the CJS to data in the m-array format: the multinomial likelihood

7.10.2. Time-dependent models

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
set.seed(123)

Read data

stan_data<- read_rdump(“mnlexmple_data.R”)

Parameters monitored

params ← c(“phi”, “p”)

MCMC settings

ni ← 20000
nt ← 1
nb ← 10000
nc ← 4

Initial values

inits ← function() list(phi= runif(dim(stan_data$marr)[2] - 1, 0.2, 0.8),
p = runif(dim(stan_data$marr)[2] - 1, 0.2, 0.8))

Call Stan from R

cjs_mnl_test ← stan(“cjs_mnl_test.stan”,
data = stan_data,init = inits, pars = params,
chains = nc, iter = ni, warmup = nb, thin = nt,
seed = 1,
open_progress = FALSE)

Summarize posteriors

print(cjs_mnl_test, digits = 3)

mnlexample.R:
marr ←
structure(c(275, 0, 25, 150, 700, 125), .Dim = 2:3)
n_occasions<-3L

Results summary of the basic csj_mnl model run on the above:

print(cjs_mnl_test, digits = 3)
Inference for Stan model: cjs_mnl_test.
4 chains, each with iter=20000; warmup=10000; thin=1;
post-warmup draws per chain=10000, total post-warmup draws=40000.

       mean se_mean    sd     2.5%      25%      50%      75%    97.5% n_eff Rhat

phi[1] 0.323 0.000 0.016 0.291 0.312 0.322 0.333 0.355 24831 1
phi[2] 0.746 0.001 0.133 0.534 0.631 0.735 0.857 0.984 18089 1
p[1] 0.852 0.000 0.026 0.797 0.835 0.853 0.870 0.900 25561 1
p[2] 0.748 0.001 0.133 0.535 0.633 0.738 0.859 0.985 17317 1
lp__ -895.377 0.013 1.477 -899.112 -896.099 -895.044 -894.288 -893.531 13403 1