cjs_data.csv (6.3 KB)

Hello! I’ve been working in Stan for a few years now (mainly building mark-recapture models), and while I love this program there are many things I do not fully understand, particularly about speed and optimization. As I’ve started to build larger and more complex models, model run times are getting excessive (e.g., weeks) and I’m wondering if there’s somewhere in my code where I am unknowingly being inefficient.

Specifically, since lots of my code involves matrix math, I wanted to know the most efficient way to do this. There seems to be a lot of discussion on how it is more efficient to use matrices and vectorization, however, I find matrix multiplication to be much slower in Stan than just running loops (see simple example below). Is there something I am doing inefficiently? The dataset I am actually fitting is much larger than this toy example (my actual code has 25 states - 42 occasions - and additional state uncertainty -I always run it using multiplication loops as the matrix math is too slow), so any speed-up tips would be really useful. Also - would using sparse matrices be helpful for this type of model?

The simple model code I have provided is a Cormack-Jolly-Seber (CJS) model to estimate survival and capture probability for animals. Animals in the study area are vulnerable to capture during multiple, discrete capture occasions. When an animal is captured, it is given an individually-identifiable mark so that it can be monitored through time. In Stan, we use a marginalized likelihood framework described in Yackulic et al. (2020) - and my example code follows this framework. I am using rstan v2.21.2

Thank you so much! I’ve attached the simulated data I used for this example as a .csv file (first 10 columns = sumCH, 11th column = sumFR, 12th column = sumf), NsumCH=dim(sumCH)[1], Nocc=10.

Maria

Yackulic, Charles B., et al. “A need for speed in Bayesian population models: a practical guide to marginalizing and recovering discrete latent states.” *Ecological Applications* 30.5 (2020): e02112.

```
/////////////////////////////////////////////////////////////////////////////////////////////////////
//example 1- matrix multiplication using loops
data{
int NsumCH; //number of unique capture histories
int Nocc; //number of occasions
int sumCH[NsumCH,Nocc]; //capture history matrix (1= captured, 2= unobserved)
int sumFR[NsumCH]; //number of individuals with particular capture history
int sumf[NsumCH]; //first capture occasion for each capture history
}
parameters{
real mu_ls; //mean survival (logit-scale)
real<lower=0, upper =5> sd_ls; //sd survival (logit)
vector[Nocc-1] z_ls; // temporal RE on survival (logit)
real mu_lp; //mean capture probability
real<lower=0, upper =5> sd_lp; //sd capture probability
vector[Nocc-1] z_lp;//temporal RE on capture probability
}
transformed parameters{
vector[Nocc-1] s; // survival (real scale)
vector[Nocc-1] p; //capture probability (real scale)
vector[2] phi[2,Nocc-1]; //transition matrix
vector[2] pmat[2,Nocc-1]; //capture probability matrix
vector[2] temp;
vector[2] pz[Nocc]; //marginalized likelihood
vector[NsumCH] llik; //log likelihood for each capture history at the last time period
for(t in 1:(Nocc-1)){
s[t] = inv_logit(mu_ls + sd_ls*z_ls[t]);
p[t] = inv_logit(mu_lp + sd_lp*z_lp[t]);
phi[1,t,1]=s[t];
phi[1,t,2]=1-s[t];
phi[2,t,1]=0;
phi[2,t,2]=1;
pmat[1,t,1]=p[t];
pmat[1,t,2]=1-p[t];
pmat[2,t,1]=0;
pmat[2,t,2]=1;
}
for (i in 1:NsumCH){
pz[sumf[i],1] = 1;
pz[sumf[i],2] = 0;
for(t in (sumf[i]+1):Nocc){
for(j in 1:2){
for(k in 1:2){
temp[k] = pz[t-1,k]*phi[k,t-1,j]*pmat[j,t-1,sumCH[i,t]];}
pz[t,j]= sum(temp);}}
llik[i] = sumFR[i]*log(sum(pz[Nocc,1:2]));}
}
model{
mu_ls~normal(0,2);
mu_lp~normal(0,2);
z_ls~normal(0,1);
z_lp~normal(0,1);
target+=(sum(llik));
}
```

looped version took ~90 seconds

```
/////////////////////////////////////////////////////////////////////////////////////////////////////
//example 2- matrix multiplication
//note: I re-arranged some of the indices here so that I could use vectors/matrices for //multiplication
parameters{
real mu_ls;
real<lower=0, upper =5> sd_ls;
vector[Nocc-1] z_ls;
real mu_lp;
real<lower=0, upper =5> sd_lp;
vector[Nocc-1] z_lp;}
transformed parameters{
vector[Nocc-1] s;
vector[Nocc-1] p;
matrix[2,2] phi[Nocc-1];
vector[2] pmat[2,Nocc-1];
vector[NsumCH] llik;
vector[2] pz[Nocc];
for(t in 1:(Nocc-1)){
s[t] = inv_logit(mu_ls + sd_ls*z_ls[t]);
p[t] = inv_logit(mu_lp + sd_lp*z_lp[t]);
phi[t,1,1]=s[t];
phi[t,2,1]=1-s[t];
phi[t,1,2]=0;
phi[t,2,2]=1;
pmat[1,t,1]=p[t];
pmat[2,t,1]=1-p[t];
pmat[1,t,2]=0;
pmat[2,t,2]=1;
}
for (i in 1:NsumCH){
pz[sumf[i],1] = 1;
pz[sumf[i],2] = 0;
for(t in (sumf[i]+1):Nocc){
pz[t] = (phi[t-1]*pz[t-1]).*pmat[sumCH[i,t],t-1,];}
llik[i] = sumFR[i]*log(sum(pz[Nocc,]));}
}
model{
mu_ls~normal(0,2);
mu_lp~normal(0,2);
z_ls~normal(0,1);
z_lp~normal(0,1);
target+= sum(llik);}
```

Matrix multiplication version took ~200 seconds - quite a bit longer

EDIT by Aki: added code block ticks and removed the posting instructions from the end