Occupancy model error with K value (number of visits) – SOLVED

Afternoon all,

I am slowing working my way through different ecology models using Stan. I stumbled across this occupancy model in the case studies.

I ran the example as is on:
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

That runs just fine. When I roll my data into it (attached here as a csv) I get the following error:
Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
Exception: model346a164a8a02_f46f730e64151416119f700ecfb76ff2_namespace::model346a164a8a02_f46f730e64151416119f700ecfb76ff2: x[k0__][k1__] is 5, but must be less than or equal to 3 (in ‘model346a164a8a02_f46f730e64151416119f700ecfb76ff2’ at line 34)
failed to create the sampler; sampling not done

We visit our sites three times a year. It’s possible for us to have hundreds of a single species in a trap.
Our n = 28, J = 20, K = 3, S = 1000. I set the super community high since at one site we 889 ants of a single species caught. I’ve also set S as high as 10000 but got the same error.

Based on the error I am guessing the model code has a part where it’s expecting a certain number of visits based on the counts. However I haven’t figured out where that is in the model.

Model from the case study

model functions {
matrix cov_matrix_2d(vector sigma, real rho) {
matrix[2,2] Sigma;
Sigma[1,1] = square(sigma[1]);
Sigma[2,2] = square(sigma[2]);
Sigma[1,2] = sigma[1] * sigma[2] * rho;
Sigma[2,1] = Sigma[1,2];
return Sigma;

real lp_observed(int x, int K, real logit_psi, real logit_theta) {
return log_inv_logit(logit_psi)
+ binomial_logit_lpmf(x | K, logit_theta);

real lp_unobserved(int K, real logit_psi, real logit_theta) {
return log_sum_exp(lp_observed(0, K, logit_psi, logit_theta),

real lp_never_observed(int J, int K, real logit_psi, real logit_theta,
real Omega) {
real lp_unavailable = bernoulli_lpmf(0 | Omega);
real lp_available = bernoulli_lpmf(1 | Omega)
+ J * lp_unobserved(K, logit_psi, logit_theta);
return log_sum_exp(lp_unavailable, lp_available);
data {
int<lower=1> J;  // sites within region
int<lower=1> K;  // visits to sites
int<lower=1> n;  // observed species
int<lower=0, upper=K> x[n,J];  // visits when species i was detected at site j
int<lower=n> S;  // superpopulation size
parameters {
real alpha;  //  site-level occupancy
real beta;   //  site-level detection
real<lower=0, upper=1> Omega;  // availability of species

real<lower=-1,upper=1> rho_uv;  // correlation of (occupancy, detection)
vector<lower=0>[2] sigma_uv;    // sd of (occupancy, detection)
vector[2] uv[S];                // species-level (occupancy, detection)
transformed parameters {
vector[S] logit_psi;    // log odds  of occurrence
vector[S] logit_theta;  // log odds of detection
for (i in 1:S)
logit_psi[i] = uv[i,1] + alpha;
for (i in 1:S)
logit_theta[i] = uv[i,2] + beta;
model {
// priors
alpha ~ cauchy(0, 2.5);
beta ~ cauchy(0, 2.5);
sigma_uv ~ cauchy(0, 2.5);
(rho_uv + 1) / 2 ~ beta(2, 2);
uv ~ multi_normal(rep_vector(0, 2), cov_matrix_2d(sigma_uv, rho_uv));
Omega ~ beta(2,2);

// likelihood
for (i in 1:n) {
1 ~ bernoulli(Omega); // observed, so available
for (j in 1:J) {
if (x[i,j] > 0)
target += lp_observed(x[i,j], K, logit_psi[i], logit_theta[i]);
target += lp_unobserved(K, logit_psi[i], logit_theta[i]);
for (i in (n + 1):S)
target += lp_never_observed(J, K, logit_psi[i], logit_theta[i], Omega);
generated quantities {
real<lower=0,upper=S> E_N = S * Omega; // model-based expectation species
int<lower=0,upper=S> E_N_2;  // posterior simulated species
vector[2] sim_uv;
real logit_psi_sim;
real logit_theta_sim;

E_N_2 = n;
for (i in (n+1):S) {
real lp_unavailable = bernoulli_lpmf(0 | Omega);
real lp_available = bernoulli_lpmf(1 | Omega)
+ J * lp_unobserved(K, logit_psi[i], logit_theta[i]);
real Pr_available = exp(lp_available
- log_sum_exp(lp_unavailable, lp_available));
E_N_2 = E_N_2 + bernoulli_rng(Pr_available);

sim_uv = multi_normal_rng(rep_vector(0,2),
cov_matrix_2d(sigma_uv, rho_uv));
logit_psi_sim = alpha + sim_uv[1];
logit_theta_sim = beta + sim_uv[2];

ant_species_by_site_2016.csv (1.7 KB)

Hi @Ara_Winter - great to see more occupancy models on this forum! The values in the observation matrix x should be between 0 and K, where K is the number of visits to each site. That’s why the error is raised - it’s telling you that a particular element in the matrix x is 5, but must be less than or equal to K=3.

Looking at the attached data, the values in x are pretty large, in the hundreds, which along with this:

makes me suspect that maybe the values in x are counts of individual ants, rather than integers that represent the number of surveys that each species was observed at each site, which would be between 0 and K. In the case study when x is described, it’s introduced as:

  • x: n \times J matrix where x_{i, j} \in 1:K is the number of visits in which species i was observed at site j

Now that I read that, I see there’s a typo (it should be x_{i, j} \in 0:K), but the error you’re getting is because you have values x_{i, j} that are greater than K. In other words, the way the data are encoded now imply that species 14 at site 2 was detected on 889 out of 3 surveys, which is impossible. This species could have been detected at this site at most 3 times out of 3 surveys.

To fix this, you’ll probably want to go back to the original data and construct the matrix x based on the number of surveys that each species was seen at each site, so that the values are all between 0 and K.

@mbjoseph Funny I was reading the original paper and it dawned on me that I had the matrix setup wrong :(

Thanks for clarifying what I was doing wrong. That solves the problem.

Hopefully once I get my head wrapped around this model a bit better I’ll have some fancified version to contribute.

Just for completeness the model is up and running.

With the attached data file here – in case anyone at home wants to play with ant data.
ant_species_by_site_2016.csv (1.5 KB)

1 Like