How to circumvent defining a integer array in transformed parameter block

Hello,

I am trying to fit a multi-state mark-recapture state-space model, similar to those in Chapter 9 of Marc Kery’s BPA (translated to stan code at https://github.com/stan-dev/example-models/tree/master/BPA).

I have written about this model previously in this stan help forum here: Where to declare individual survival parameter in multi-state state-space model (Again, many thanks to Daniel Lee for his quick reply.)

My model is working as expected when the response variable is completely observed.
However, I have the perhaps slightly unusual case that observations at one of the occasions are missing because a detection device was unavailable.
I have read chapter 10 (page 177+ in stan reference manual 2.15.0 ) and I have searched this forum and studied the following hits:

But still I have been unable to find a solution to my issue.

Here is my (simplified) model:

// -------------------------------------------------
// States (S):
// 1 juvenile
// 2 adult
// 3 dead
// Observations (O):
// 1 seen as juvenile
// 2 seen as adult
// 3 not seen
// -------------------------------------------------

// Data
data {

  // Variable declarations
  int<lower = 0> n_yrs;
  int<lower = 0> n_inds;
  int<lower = 0> n_occs;
  int<lower = 1, upper = 3> y[n_inds, n_occs];

}

// Transformed data
transformed data {
  
  // Varaible calculations
  int n_occs_less1 = n_occs - 1;

}

// Parameters
parameters {

  // Parameter declarations
  real<lower = 0, upper = 1> phi_juv;            // ad survival
  real<lower = 0, upper = 1> phi_ad;             // ad survival
  real<lower = 0, upper = 1> p;                  // probability of detection

}

// Transformed parameters
transformed parameters {

  // Parameter declarations
  simplex[3] ps[3, n_inds, n_occs_less1];
  simplex[3] po[3, n_inds, n_occs_less1];

  // Define state-transition and observation matrices
  for (i in 1:n_inds) {
    for (o in 1:n_occs_less1) {
    
      // Define probabilities of state S(y+1) given S(y)
      ps[1, i, o, 1] = 0.0;
      ps[1, i, o, 2] = phi_juv;
      ps[1, i, o, 3] = (1 - phi_juv);
      ps[2, i, o, 1] = 0.0;
      ps[2, i, o, 2] = phi_ad;
      ps[2, i, o, 3] = (1 - phi_ad);
      ps[3, i, o, 1] = 0.0;
      ps[3, i, o, 2] = 0.0;
      ps[3, i, o, 3] = 1.0;

      // Define probabilities of O(y) given S(y)
      po[1, i, o, 1] = 0.0;
      po[1, i, o, 2] = 0.0;
      po[1, i, o, 3] = 1.0;
      po[2, i, o, 1] = 0.0;
      po[2, i, o, 2] = p;
      po[2, i, o, 3] = (1 - p);
      po[3, i, o, 1] = 0.0;
      po[3, i, o, 2] = 0.0;
      po[3, i, o, 3] = 1.0;

    } // o
  } // i

}

// Model
model {

  // Variable declarations
  real acc[3];
  vector[3] gamma[n_occs];

  // Likelihood
  // Forward algorithm derived from Stan Modeling Language User's Guide and Reference Manual
  for (i in 1:n_inds) {
    for (k in 1:3) {
      gamma[1, k] = (k == y[i, 1]);
    }
    for (o in 2:n_occs) {
      for (k in 1:3) {
        for (j in 1:3) {
          acc[j] = gamma[o - 1, j] * ps[j, i, o - 1, k] * po[k, i, o - 1, y[i, o]];
        }
        gamma[o, k] = sum(acc);
      }
    }
    target += log(sum(gamma[n_occs]));
  }

} // end model

And here is my model with attempts to account for the missing data at occasion 3 in the first year:

  // -------------------------------------------------
  // States (S):
  // 1 juvenile
  // 2 adult
  // 3 dead
  // Observations (O):
  // 1 seen as juvenile
  // 2 seen as adult
  // 3 not seen
  // -------------------------------------------------

  // Data
  data {
    
    // Variable declarations
    int<lower = 0> n_yrs;
    int<lower = 0> n_inds;
    int<lower = 0> n_occs;
    int<lower = 0> n_obscol3;
    int<lower = 0> n_miscol3;
    int<lower = 1, upper = n_obscol3 + n_miscol3> i_obscol3[n_obscol3];
    int<lower = 1, upper = n_obscol3 + n_miscol3> i_miscol3[n_miscol3];
    vector<lower = 1, upper = 3>[n_obscol3] y_obscol3;
    int<lower = 1, upper = 3> y_obs[n_inds, 2];

  }

  // Transformed data
  transformed data {
  
    // Varaible calculations
    int n_occs_less1 = n_occs - 1;

  }

  // Parameters
  parameters {
    
    // Parameter declarations
    real<lower = 0, upper = 1> phi_juv;    // juv survival
    real<lower = 0, upper = 1> phi_ad;     // ad survival
    real<lower = 0, upper = 1> p;          // probability of detection
    vector<lower = 1, upper = 3>[n_miscol3] y_miscol3;

  }

  // Transformed parameters
  transformed parameters {

    // Parameter declarations
    simplex[3] ps[3, n_inds, n_occs_less1];
    simplex[3] po[3, n_inds, n_occs_less1];
    // matrix<lower = 1, upper = 3>[n_inds, n_occs] y;
    int<lower = 1, upper = 3> y[n_inds, n_occs];
    y[i_obscol3, 3] = y_obscol3;
    y[i_miscol3, 3] = y_miscol3;

    // Define state-transition and observation matrices
    for (i in 1:n_inds) {
      for (o in 1:n_occs_less1) {
        
        // Define probabilities of state S(y+1) given S(y)
        ps[1, i, o, 1] = 0.0;
        ps[1, i, o, 2] = phi_juv;
        ps[1, i, o, 3] = (1 - phi_juv);
        ps[2, i, o, 1] = 0.0;
        ps[2, i, o, 2] = phi_ad;
        ps[2, i, o, 3] = (1 - phi_ad);
        ps[3, i, o, 1] = 0.0;
        ps[3, i, o, 2] = 0.0;
        ps[3, i, o, 3] = 1.0;

        // Define probabilities of O(y) given S(y)
        po[1, i, o, 1] = 0.0;
        po[1, i, o, 2] = 0.0;
        po[1, i, o, 3] = 1.0;
        po[2, i, o, 1] = 0.0;
        po[2, i, o, 2] = p;
        po[2, i, o, 3] = (1 - p);
        po[3, i, o, 1] = 0.0;
        po[3, i, o, 2] = 0.0;
        po[3, i, o, 3] = 1.0;

      } // o
    } // i

  }

  // Model
  model {

    // Variable declarations
    real acc[3];
    vector[3] gamma[n_occs];

    // Likelihood
    // Forward algorithm derived from Stan Modeling Language User's Guide and Reference Manual
    for (i in 1:n_inds) {
      for (k in 1:3) {
        gamma[1, k] = (k == y[i, 1]);
      }
      for (o in 2:n_occs) {
        for (k in 1:3) {
          for (j in 1:3) {
            acc[j] = gamma[o - 1, j] * ps[j, i, o - 1, k] * po[k, i, o - 1, y[i, o]];
          }
          gamma[o, k] = sum(acc);
        }
      }
      target += log(sum(gamma[n_occs]));
    }

  } // end model

I am getting the following error during compilation:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

parameters or transformed parameters cannot be integer or integer array;  found declared type int, parameter name=y
Problem with declaration.
  error in 'model35d018f9625_eebc87a8ea17142b0af29d6ae65307d5' at line 55, column 48
  -------------------------------------------------
    53:     simplex[3] po[3, n_inds, n_occs_less1];
    54:     // matrix<lower = 1, upper = 3>[n_inds, n_occs] y;
    55:     int<lower = 1, upper = 3> y[n_inds, n_occs];
                                                       ^
    56:     y[i_obscol3, 3] = y_obscol3;
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'eebc87a8ea17142b0af29d6ae65307d5' due to the above error.

To me, this suggests that the following definition is not allowed in the transformed parameters or parameters block:

int<lower = 1, upper = 3> y[n_inds, n_occs];

When I replace it with:

matrix<lower = 1, upper = 3>[n_inds, n_occs] y;

in the transformed parameter block, the compilation proceeds into the model block but then I get the error that y should be an integer:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

ERROR:  Container index must be integer; found type=real
  error in 'model35d06ed8176b_42b1f38c990210b6cf618b2cb4d895a2' at line 112, column 77
  -------------------------------------------------
   110:         for (k in 1:3) {
   111:           for (j in 1:3) {
   112:             acc[j] = gamma[o - 1, j] * ps[j, i, o - 1, k] * po[k, i, o - 1, y[i, o]];
                                                                                    ^
   113:           }
  -------------------------------------------------

PARSER EXPECTED: <one or more container indexes followed by ']'>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '42b1f38c990210b6cf618b2cb4d895a2' due to the above error.

which makes sense.

Here is some R code to simulate some data:

## define function to simulate multistate capture-recapture data
simul.ms <- function(PSI.STATE, PSI.OBS, marked, nocc){
  nyr <- length(marked)
  CH <- CH.TRUE <- array(NA, dim = list(max(marked), nocc, nyr))
  for(y in 1:nyr){
    CH[1:marked[y], 1, y] <- 1 # always seen at release
    CH.TRUE[1:marked[y], 1, y] <- 1  # always seen at release
  }
  for(y in 1:nyr){
    for(i in 1:marked[y]){
      for(o in 2:nocc){
        state.p <- rmultinom(1, 1, PSI.STATE[[y]][CH.TRUE[i, o-1, y],,i,o-1])
        state <- which(state.p == 1)
        CH.TRUE[i, o, y] <- state
        event.p <- rmultinom(1, 1, PSI.OBS[[y]][CH.TRUE[i, o, y],,i,o-1])
        event <- which(event.p == 1)
        CH[i, o, y] <- event
      } #o
    } #i
  } #y
  return(list(CH = CH, CH.TRUE = CH.TRUE))
}

## define number of states, occasions, released individuals & years
n.states <- 3 # juv, ad, dead
n.occs <- 3
n.years <- 3
marked <- sample(x = 100:300, size = n.years, replace = TRUE)
years <- factor(rep(1:n.years, marked), labels = (2017 - (n.years - 1)):2017)
n.inds <- sum(marked)

## define parameters
phi.juv <- 0.08
phi.ad <- 0.8
p <- 0.75

## define state and observation matrices
PSI.STATE <- vector('list', n.years)
for(y in 1:n.years){
  PSI.STATE[[y]] <- array(NA, dim = c(n.states, n.states, marked[y], n.occs - 1))
}
for(y in 1:n.years){
  for(i in 1:marked[y]){
    for(o in 1:(n.occs - 1)){
      PSI.STATE[[y]][,,i,o] <- matrix(c(
        
        0,          phi.juv,            1 - phi.juv,
        0,          phi.ad,             1 - phi.ad,
        0,          0,                  1
        
      ), nrow = n.states, byrow = TRUE)
    } # o
  } # i
} # y
PSI.OBS <- vector('list', n.years)
for(y in 1:n.years){
  PSI.OBS[[y]] <- array(NA, dim = c(n.states, n.states, marked[y], n.occs - 1))
}
for(y in 1:n.years){
  for(i in 1:marked[y]){
    for(o in 1:(n.occs - 1)){
      PSI.OBS[[y]][,,i,o] <- matrix(c(
        
        0, 0, 1,
        0, p, 1-p,
        0, 0, 1
        
      ), nrow = n.states, byrow = TRUE)
    } # o
  } # i
} # y

## execute simulation function
sim <- simul.ms(PSI.STATE, PSI.OBS, marked, n.occs)
ch <- sim$CH

## make into a single matrix
ch.mat <- matrix(NA, ncol = (n.years + (n.occs - 1)), nrow = sum(marked))
ridx <- 1
cidx <- 1
for(y in 1:n.years){
  ch.mat[ridx:(ridx + (marked[y] - 1)), cidx:(cidx + (n.occs - 1))] <- ch[1:marked[y], , cidx]
  ridx <- ridx + marked[y]
  cidx <- cidx + 1
}

## recode NAs as state 3
ch.mat[is.na(ch.mat)] <- 3

## add NAs at occasion 3 in year 1
if(addNAs){
  ch[, n.occs, 1] <- NA
}

## make an easy matrix
ch.emat <- numeric()
for(i in 1:dim(ch)[3]){
  ch.emat <- rbind(ch.emat, ch[1:marked[i], , i])
}

## dice it up into constituent parts
y.obs <- ch.emat[, 1:2]
i.obscol3 <- which(!is.na(ch.emat[, 3]))
i.miscol3 <- which(!is.na(ch.emat[, 3]))
n.obscol3 <- length(i.obscol3)
n.miscol3 <- length(i.miscol3)
y.obscol3 <- as.integer(na.omit(ch.emat[, 3]))

Could anyone help / guide me towards a possible solution to this problem?

Many thanks in advance for your help,
Stephen

p.s., sorry for the obscenely long post!

Update: I just found this post on another forum.

Ben’s answer confirms my fear that this might be difficult in stan?

Many thanks,
Stephen

As the error messag says, “parameters or transformed parameters cannot be integer or integer array;” So you can’t code it up directly, like is possible in JAGS/BUGS.

Fiddly, but perhaps not intractable. And the advantage is that if you marginalize, it should be much better behaved for sampling.

If the missing data looks like this:

int<lower = 1, upper = 3> y[n_inds, n_occs];

does it factor by index, so that the likelihood looks as follows?

PRODUCT_{n_ind, n_occ} p(y[n_ind, n_occ] | theta)

If so, it’s a bit fiddly to program, but should work. What you need to do is marginalize out the discrete missing data. There’s yet another chapter in the manual on latent discrete parameters—the missing discrete data needs to be treated as parameters.

Many thanks, Bob. I will read the chapter and respond asap (away for a couple of weeks).

For anyone following this Discussion, I think the chapter Bob is referring to is Chapter 15. Latent Discrete Parameters on page 210 of the stan-reference-2.17.0.pdf manual at http://mc-stan.org/users/documentation/

Again, many thanks!
Stephen