Functions inside occupancy model

Big picture: I am updating a three-level occupancy model. My end goal is a more complicated model and I plan on using reduce_sum() (so the model does not with RStan, only cmdstanr). the attached code is part of my stepping stone to that goal. My session info is at the end of this message.

Problem: When I place some code in functions, the model stops working. I know that I do not completely understand functions in Stan, but do not see where I am making a mistake. The model compiles, but the results become non-sensical.

My current two problems are

  1. I need a target_temp in sample_nondetection_loop_lpmf() but not sample_detection_loop_lpmf(). Any idea why?

For example, compare

real sample_nondetection_loop_lpmf(
  array[] int y,
  array[] int k_samples,
  vector log_theta,
  vector logit_p,
  vector log1m_theta,
  int n_samples) {
    
    real target_temp = 0;
    for(sample in 1:n_samples){
	    target_temp +=
	      sample_nondetection_lpmf(
	        y[sample] |
	        k_samples[sample],
	        log_theta[sample],
	        logit_p[sample],
	        log1m_theta[sample]);
	     }
    return target_temp;
  }

to

real sample_detection_loop_lpmf(
  array[] int y,
  array[] int k_samples,
  vector log_theta,
  vector logit_p,
  vector log1m_theta,
  int n_samples,
  array[] int sample_seen) {

    for(sample in 1:n_samples){
      if(sample_seen[sample] > 0){ // Sample has known  detection
      return sample_detection_lpmf(
        y[sample] |
        k_samples[sample],
        log_theta[sample],
        logit_p[sample]);

        } else { // Sample does not have a known detection
	    return sample_nondetection_lpmf(
	      y[sample] |
	      k_samples[sample],
	      log_theta[sample],
	      logit_p[sample],
	      log1m_theta[sample]);
	      }
    }
    return 0;
  }
  1. When I try to loop over units, this function does not work:
real unit_prob_lpmf(
  array[] int y,
  array[] int k_samples,
  int n_unit,
  array[] int n_samples,
  array[] int any_seen,
  array[] int sample_seen,
  vector logit_p,
  vector log_theta,
  vector log1m_theta,
  vector log_psi,
  vector log1m_psi,
  array[] int start_idx,
  array[] int end_idx
){

real target_temp = 0;

  for (site in 1:n_unit) { 
    if (n_samples[site] > 0) { // Make sure site was samples
      if (any_seen[site] > 0 ) { // Site has known detections
      
      target_temp += 
      site_detection_lpmf(
        y[start_idx[site]:end_idx[site]] |
        k_samples[start_idx[site]:end_idx[site]],
        log_theta[start_idx[site]:end_idx[site]],
        logit_p[start_idx[site]:end_idx[site]],
        log1m_theta[start_idx[site]:end_idx[site]],
        n_samples[site],
        sample_seen[start_idx[site]:end_idx[site]], 
        log_psi[site]);
      } else {
      // site may or may not be occupied 

      target_temp +=
        log_sum_exp(log_psi[site] +
          sample_nondetection_loop_lpmf(
            y[start_idx[site]:end_idx[site]] |
            k_samples[start_idx[site]:end_idx[site]],
            log_theta[start_idx[site]:end_idx[site]],
            logit_p[start_idx[site]:end_idx[site]],
            log1m_theta[start_idx[site]:end_idx[site]],
            n_samples[site]),
            log1m_psi[site]);
      }
      return 0;
    }
    return 0;
  }
  return target_temp;
}

but, this plain code works

	target +=
	  site_detection_lpmf(
	    y[start_idx[site]:end_idx[site]] |
      k_samples[start_idx[site]:end_idx[site]],
      log_theta[start_idx[site]:end_idx[site]],
      logit_p[start_idx[site]:end_idx[site]],
      log1m_theta[start_idx[site]:end_idx[site]],
      n_samples[site],
      sample_seen[start_idx[site]:end_idx[site]],
      log_psi[site]);


      } else {
      // site may or may not be occupied

      target +=
      log_sum_exp(log_psi[site] +
        sample_nondetection_loop_lpmf(
          y[start_idx[site]:end_idx[site]] |
          k_samples[start_idx[site]:end_idx[site]],
          log_theta[start_idx[site]:end_idx[site]],
          logit_p[start_idx[site]:end_idx[site]],
          log1m_theta[start_idx[site]:end_idx[site]],
          n_samples[site]),
          log1m_psi[site]);

      }
    }
  }

Reproducible examples:
detect_3.stan (3.3 KB)
occupancy_3.stan (3.1 KB)
dev-occ_3_stan.R (2.6 KB)

Session info:

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cmdstanr_0.5.3  forcats_0.5.1   stringr_1.4.0   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2     tidyr_1.2.0     tibble_3.1.8   
 [9] ggplot2_3.3.6   tidyverse_1.3.2 occStan_1.0.0  

loaded via a namespace (and not attached):
 [1] httr_1.4.3           jsonlite_1.8.0       modelr_0.1.8         RcppParallel_5.1.5   StanHeaders_2.26.13  assertthat_0.2.1    
 [7] posterior_1.2.2      distributional_0.3.0 stats4_4.2.1         tensorA_0.36.2       googlesheets4_1.0.0  cellranger_1.1.0    
[13] pillar_1.8.0         backports_1.4.1      glue_1.6.2           checkmate_2.1.0      rvest_1.0.2          colorspace_2.0-3    
[19] pkgconfig_2.0.3      rstan_2.26.13        broom_1.0.0          haven_2.5.0          scales_1.2.0         processx_3.7.0      
[25] tzdb_0.3.0           googledrive_2.0.0    generics_0.1.3       farver_2.1.1         ellipsis_0.3.2       withr_2.5.0         
[31] cli_3.3.0            magrittr_2.0.3       crayon_1.5.1         readxl_1.4.0         ps_1.7.1             fs_1.5.2            
[37] fansi_1.0.3          xml2_1.3.3           pkgbuild_1.3.1       data.table_1.14.2    tools_4.2.1          loo_2.5.1           
[43] prettyunits_1.1.1    hms_1.1.1            gargle_1.2.0         lifecycle_1.0.1      matrixStats_0.62.0   V8_4.2.1            
[49] munsell_0.5.0        reprex_2.0.1         callr_3.7.1          compiler_4.2.1       rlang_1.0.4          grid_4.2.1          
[55] rstudioapi_0.13      gtable_0.3.0         codetools_0.2-18     inline_0.3.19        abind_1.4-5          DBI_1.1.3           
[61] curl_4.3.2           R6_2.5.1             gridExtra_2.3        lubridate_1.8.0      knitr_1.39           utf8_1.2.2          
[67] stringi_1.7.8        parallel_4.2.1       Rcpp_1.0.9           vctrs_0.4.1          xfun_0.31            dbplyr_2.2.1        
[73] tidyselect_1.1.2  
1 Like

It would be helpful if you could provide two complete Stan programs, one of which works, and one of which attempts to move code into a function and fails. Standard indentation would also be helpful.

My guess from what you’ve shared is that the return 0 statements are going to return prematurely. But I can’t tell without the working version to compare.

1 Like

@Bob_Carpenter Thank you for your suggestions.

Here’s what I did to solve the problem:

  1. Cleaning up the code using emacs’s stan mode (rather than RStudio) helped.
  2. I tried to create a reproducible example following your guidance.
  3. I removed all of the return 0’s.

My code now works!