Hello all,
I provide a description of my model and Stan code at the end of this message. However, I believe that my question is potentially of general interest irrespective of the details of the model, and so I first ask the question in general form.
I am preparing to fit a complex model to a large dataset. I expect a runtime on the order of weeks (at least), and I expect the effective number of posterior samples to be substantially lower than the nominal number of posterior samples, even with reasonably well-tuned HMC sampling. In addition to ensuring that I parameterize the likelihood efficiently and write efficient Stan code, I am concerned with selecting a reasonable number of warmup iterations to use, balancing the duration of the warmup period with the efficiency of posterior exploration later on.
Here are some initial ideas that I have for how to choose a plausible warmup length, and I would really appreciate feedback or further thoughts from this community.
- Use campfire for adaptive warmup?
- rstanâs default is to use iter/2 iterations for warmup and the other half for sampling. I could make a reasonable guess about how many sampling iterations I might need to achieve reasonable effective sample sizes and set the length of warmup to be equal to that number?
- Study the behavior on smaller datasets (either with fewer data points per level of the main random effects terms or with fewer levels in those terms), and use some intuition (youâll need to clue me in as to what this intuition might be!) about how the appropriate warmup period scales with model size to guess at an appropriate warmup period for the large data?
Gory model details
I am fitting a multi-species site occupancy model similar to this example from Dorazio et al, translated to Stan by Bob Carpenter. My model differs in four key respects. First, I am not using data-augmentation to model never-observed species. Second, I have site- and visit-specific covariates on detection that require a more complicated parameterization of the likelihood (I need visit-specific Bernoulli sampling, rather than site-specific binomial sampling). Third, and relatedly, I ultimately want to fit a model that incorporates fairly rich covariate information, with covariates involving site characteristics, visit characteristics, species characteristics, and interactions between site characteristics and species characteristics. Fourth, the size of my data is MUCH larger: 850 sites, 4 visits, and (the kicker) 1500 species. Note that âspeciesâ in this model is a grouping factor for random effects (intercepts and slopes) on both occupancy and detection.
So far, I have generated simulated datasets with 900 sites and ranging in size from 30 to 240 species, and I have analyzed these using a toy model similar to (but simpler than) what I ultimately want to fit. For the time being, I have fit these models using rstan with 2000 iterations (i.e. 1000 warmup and 1000 sampling). These models run without divergences, and r-hat diagnostics indicate convergence across four chains. The runtime seems to scale roughly linearly with the number of species (with substantial variance across different simulated datasets and smaller variance between chains when I fit the model), but additionally the number of effective samples decreases substantially as the data size grows (from ~800 for the worst-case parameter in the 30-species case to ~200 for the worst-case parameter in the 240-species case). Ultimately, I expect to have access to high-performance computing resources and I hope to take advantage of map_rect when I fit the final model.
R and Stan code follows. Because I write the Stan code in a string directly in R, I provde the full R script first, and then just the Stan model pasted below with Stan code formatting
The following code is neither pretty nor fully optimized, but in case anybody is still reading here it is. This will simulate data an analyze it under a toy model.
- I expect that I will be back with more questions related to this model in the coming weeks/months!
- I am aware of some minor optimizations still available (vectorizing priors over the model coefficients). While not the focus of my current post, if anybody spots a major potential efficiency gain, I would of course be very interested!
- The current toy model doesnât perfectly match the data simulation because (look closely at the linear predictor in the transformed data block), I am currently ignoring the effects of some of the simulated coefficients.
Full R Script
rstan_options(auto_write = T)
# i indexes points, j indexes visits, k indexes species.
# Define data size
n_cluster <- 300 # points are grouped into clusters, and cluster-by-species will ultimately be a random effect on occupancy
ppc <- 3 # three points per cluster
n_point <- n_cluster*ppc
n_species <- 1500
n_visit <- rep(4, n_point)
# Define covariates (covariates associated with species visit and point; point covariates include continuous covariates that
# do not vary within clusters as well as continuous covariates that do vary within clusters)
clusterID <- rep(c(1:n_cluster), ppc)
sp_cov1 <- runif(n_species) - .5
sp_cov2 <- runif(n_species) - .5
pt_cov1 <- runif(n_point) - .5
cl.cov.1 <- runif(n_cluster) - .5
pt_cov2 <- rep(NA, n_point)
for(i in 1:n_point){
pt_cov2[i] <- cl.cov.1[clusterID[i]]
}
vis_cov1 <- matrix(data = NA, nrow = n_point, ncol = max(n_visit))
for(i in 1:n_point){
vis_cov1[i, ] <- runif(n_visit[i]) - .5
}
exists_visit <- matrix(data = 0, nrow = n_point, ncol = max(n_visit))
for(i in 1:n_point){
for(j in 1:n_visit[i]){
exists_visit[i,j] <- 1
}
}
# Define hyperparameters
occ.hyper <- list(b0 = c(0, .5), b1 = c(0, 1), b2 = c(1, .5), b3 = c(-1, 1), b4 = c(0, .2),
b5 = c(1,2), b6 = c(2, 2))
b0 <- rnorm(n_species, occ.hyper$b0[1], occ.hyper$b0[2])
b1 <- matrix(data = rnorm(n_cluster*n_species, occ.hyper$b1[1], occ.hyper$b1[2]), nrow = n_cluster)
b2 <- rnorm(n_species, occ.hyper$b2[1], occ.hyper$b2[2])
b3 <- rnorm(n_species, occ.hyper$b3[1], occ.hyper$b3[2])
b4 <- rnorm(n_species, occ.hyper$b4[1], occ.hyper$b4[2])
b5 <- rnorm(n_species, occ.hyper$b5[1], occ.hyper$b5[2])
b6 <- rnorm(n_species, occ.hyper$b6[1], occ.hyper$b6[2])
det.hyper <- list(d0 = c(-2, .5), d1 = c(0, 1), d2 = c(0, .5), d3 = c(1, 1), d4 = c(0, .2),
d5 = c(2, .5))
d0 <- rnorm(n_species, det.hyper$d0[1], det.hyper$d0[2])
d1 <- matrix(data = rnorm(n_point*n_species, det.hyper$d1[1], det.hyper$d1[2]), nrow = n_point)
d2 <- rnorm(n_species, det.hyper$d2[1], det.hyper$d2[2])
d3 <- rnorm(n_species, det.hyper$d3[1], det.hyper$d3[2])
d4 <- rnorm(n_species, det.hyper$d4[1], det.hyper$d4[2])
d5 <- rnorm(n_species, det.hyper$d5[1], det.hyper$d5[2])
# The below simulation is super inefficient, but the version with for-loops helps me keep straight
# everything that is going on.
# Simulate parameters from hyperparameters
logit.occ <- psi <- matrix(NA, nrow = n_point, ncol = n_species)
for(i in 1:n_point){
for(k in 1:n_species){
logit.occ[i, k] <- b0[k] + b1[clusterID[i],k] + b2[k]*sp_cov1[k] + b3[k]*sp_cov2[k] +
b4[k]*pt_cov1[i] + b5[k]*pt_cov2[i] + b6[k]*sp_cov1[k]*pt_cov2[i]
psi[i, k] <- boot::inv.logit(logit.occ[i, k])
}
}
logit.det <- theta <- array(NA, dim = c(n_point, max(n_visit), n_species))
for(i in 1:n_point){
for(j in 1:n_visit[i]){
for(k in 1:n_species){
logit.det[i, j, k] <- d0[k] + d1[i,k] + d2[k]*sp_cov1[k] + d3[k]*sp_cov2[k] +
d4[k]*pt_cov1[i] + d5[k]*vis_cov1[i,j]
theta[i, j, k] <- boot::inv.logit(logit.det[i, j, k])
}
}
}
# simulate data from parameters
Z <- matrix(NA, nrow = n_point, ncol = n_species)
for(i in 1:n_point){
for(k in 1:n_species){
Z[i, k] <- rbinom(1, 1, psi[i, k])
}
}
det_data <- array(NA, dim = c(n_point, max(n_visit), n_species))
for(i in 1:n_point){
for(j in 1:n_visit[i]){
for(k in 1:n_species){
det_data[i,j,k] <- Z[i, k] * rbinom(1, 1, theta[i,j,k])
}
}
}
Q <- apply(det_data, c(1,3), function(x){return(as.numeric(sum(x) > 0))})
stan.model <- '
data {
int<lower=1> n_point; //number of sites
int<lower=1> n_visit; //fixed number of visits
int<lower=1> n_species; //number of species
int<lower=1> n_cluster; //number of clusters (points are grouped into clusters)
int<lower=0, upper=1> det_data[n_point, n_visit, n_species]; //detection history
int<lower=0, upper=1> Q[n_point, n_species]; //at least one detection
int<lower=0, upper=n_cluster> clusterID[n_point]; //cluster identifier (for random effects)
vector[n_species] sp_cov1; //species covariate 1
vector[n_species] sp_cov2; //species covariate 2
vector[n_point] pt_cov1; //point covariate 1
vector[n_point] pt_cov2; //point covariate 2
matrix[n_point, n_visit] vis_cov1; //visit covariate 1
}
parameters {
real mu_b0;
real<lower=0> sigma_b0;
vector[n_species] b0_raw;
real<lower=0> sigma_b1;
matrix[n_species, n_cluster] b1_raw;
real b2;
real b3;
real mu_b4;
real<lower=0> sigma_b4;
vector[n_species] b4_raw;
real mu_b5;
real<lower=0> sigma_b5;
vector[n_species] b5_raw;
real mu_b6;
real<lower=0> sigma_b6;
vector[n_species] b6_raw;
real mu_d0;
real<lower=0> sigma_d0;
vector[n_species] d0_raw;
real<lower=0> sigma_d1;
matrix[n_species, n_point] d1_raw;
real d2;
real d3;
real mu_d4;
real<lower=0> sigma_d4;
vector[n_species] d4_raw;
real mu_d5;
real<lower=0> sigma_d5;
vector[n_species] d5_raw;
}
transformed parameters{
vector[n_species] b0 = mu_b0 + b0_raw * sigma_b0;
matrix[n_species, n_cluster] b1 = b1_raw * sigma_b1;
vector[n_species] b4 = mu_b4 + b4_raw * sigma_b4;
vector[n_species] b5 = mu_b5 + b5_raw * sigma_b5;
vector[n_species] b6 = mu_b6 + b6_raw * sigma_b6;
vector[n_species] d0 = mu_d0 + d0_raw * sigma_d0;
matrix[n_species, n_point] d1 = d1_raw * sigma_d1;
vector[n_species] d4 = mu_d4 + d4_raw * sigma_d4;
vector[n_species] d5 = mu_d5 + d5_raw * sigma_d5;
real logit_psi[n_point, n_species];
real logit_theta[n_point, n_visit, n_species];
matrix[n_point, n_species] log_prob_increment;
for(i in 1:n_point){
for(k in 1:n_species){
logit_psi[i,k] = b0[k] + b1[k, clusterID[i]] + //b2*sp_cov1[k] + b3*sp_cov2[k] +
b4[k]*pt_cov1[i]; // + b5[k]*pt_cov2[i] + b6[k]*sp_cov1[k]*pt_cov2[i]);
}
}
for(i in 1:n_point){
for(j in 1:n_visit){
for(k in 1:n_species){
logit_theta[i,j,k] = d0[k] + d2*sp_cov1[k] + // d3*sp_cov2[k] +
// d4[k]*pt_cov1[i] +
d5[k]*vis_cov1[i,j];
}
}
}
for(i in 1:n_point){
for(k in 1:n_species){
if(Q[i,k] == 1)
log_prob_increment[i,k] = log_inv_logit(logit_psi[i,k]) +
bernoulli_logit_lpmf(det_data[i,1,k] | logit_theta[i,1,k]) +
bernoulli_logit_lpmf(det_data[i,2,k] | logit_theta[i,2,k]) +
bernoulli_logit_lpmf(det_data[i,3,k] | logit_theta[i,3,k]) +
bernoulli_logit_lpmf(det_data[i,4,k] | logit_theta[i,4,k]);
else
log_prob_increment[i,k] = log_sum_exp(log_inv_logit(logit_psi[i,k]) + log1m_inv_logit(logit_theta[i,1,k]) +
log1m_inv_logit(logit_theta[i,2,k]) + log1m_inv_logit(logit_theta[i,3,k]) +
log1m_inv_logit(logit_theta[i,4,k]),
log1m_inv_logit(logit_psi[i,k]));
}
}
}
model {
//Hyper-priors:
mu_b0 ~ normal(0,10);
b2 ~ normal(0,10);
b3 ~ normal(0,10);
mu_b4 ~ normal(0,10);
mu_b5 ~ normal(0,10);
mu_b6 ~ normal(0,10);
mu_d0 ~ normal(0,10);
d2 ~ normal(0,10);
d3 ~ normal(0,10);
mu_d4 ~ normal(0,10);
mu_d5 ~ normal(0,10);
sigma_b0 ~ normal(0,10);
sigma_b1 ~ normal(0,10);
sigma_b4 ~ normal(0,10);
sigma_b5 ~ normal(0,10);
sigma_b6 ~ normal(0,10);
sigma_d0 ~ normal(0,10);
sigma_d1 ~ normal(0,10);
sigma_d4 ~ normal(0,10);
sigma_d5 ~ normal(0,10);
//Random Effects
b0_raw ~ normal(0, 1);
to_vector(b1_raw) ~ normal(0, 1);
b4_raw ~ normal(0, 1);
b5_raw ~ normal(0, 1);
b6_raw ~ normal(0, 1);
d0_raw ~ normal(0, 1);
to_vector(d1_raw) ~ normal(0, 1);
d4_raw ~ normal(0, 1);
d5_raw ~ normal(0, 1);
//Likelihood (data level)
target += sum(log_prob_increment);
}'
stan.data <- list(n_point = n_point, n_species = n_species,
n_cluster = n_cluster, n_visit = 4,
det_data = det_data,
Q = Q,
clusterID = clusterID,
sp_cov1 = sp_cov1, sp_cov2 = sp_cov2,
pt_cov1 = pt_cov1, pt_cov2 = pt_cov2,
vis_cov1 = vis_cov1)
nc <- 4
start.time <- Sys.time()
stan.samples <- stan(model_code = stan.model, data = stan.data, iter = 2000, chains = nc, cores = nc,
pars = c('logit_psi', 'logit_theta', 'log_prob_increment', 'd1_raw', 'd1', 'b1_raw', 'b1'),
include = FALSE, refresh = 10)
elapsed <- Sys.time() - start.time
summary.samples <- summary(stan.samples)
which(summary.samples$summary[,9] == min(summary.samples$summary[,9]))
max(summary.samples$summary[,10])
object.size(stan.samples)
######
# 30 species, all params saved, small model:
# min neff = 862
# max R-hat = 1.004
# grad. eval. range: .046 - .072
# elapsed range: 7427 - 7490
# object.size: 15178522976 bytes
# 30 species, few params saved, small model:
# min neff = 809
# max R-hat = 1.012
# grad. eval. range: .049 - .077
# elapsed range: 5649 - 6839
# 60 species, few params saved, 2000 iterations, small model:
# min neff = 303.35
# max R-hat = 1.022
# grad. eval. range: .088 - .152
# elapsed range: 13573 - 16148
# object.size: 104002808 bytes
# 120 species, few params saved, 2000 iterations, small model:
# min neff = 453.734
# max R-hat = 1.014
# grad. eval. range: .180 - .312
# elapsed range: 25435 - 33187
# 240 species, few params saved, 2000 iterations, small model:
# min neff = 229
# max R-hat = 1.017
# grad. eval. range: .363 - .531
# elapsed range: 89908 - 97478
# 1500 species, few params saved, 2000 iterations, small model, CXX flags updated:
# Started near 1410h on 22/4/20
# min neff =
# max R-hat =
# grad. eval. range: 2.90 - 3.63
# elapsed range:
Just the Stan code
data {
int<lower=1> n_point; //number of sites
int<lower=1> n_visit; //fixed number of visits
int<lower=1> n_species; //number of species
int<lower=1> n_cluster; //number of clusters (points are grouped into clusters)
int<lower=0, upper=1> det_data[n_point, n_visit, n_species]; //detection history
int<lower=0, upper=1> Q[n_point, n_species]; //at least one detection
int<lower=0, upper=n_cluster> clusterID[n_point]; //cluster identifier (for random effects)
vector[n_species] sp_cov1; //species covariate 1
vector[n_species] sp_cov2; //species covariate 2
vector[n_point] pt_cov1; //point covariate 1
vector[n_point] pt_cov2; //point covariate 2
matrix[n_point, n_visit] vis_cov1; //visit covariate 1
}
parameters {
real mu_b0;
real<lower=0> sigma_b0;
vector[n_species] b0_raw;
real<lower=0> sigma_b1;
matrix[n_species, n_cluster] b1_raw;
real b2;
real b3;
real mu_b4;
real<lower=0> sigma_b4;
vector[n_species] b4_raw;
real mu_b5;
real<lower=0> sigma_b5;
vector[n_species] b5_raw;
real mu_b6;
real<lower=0> sigma_b6;
vector[n_species] b6_raw;
real mu_d0;
real<lower=0> sigma_d0;
vector[n_species] d0_raw;
real<lower=0> sigma_d1;
matrix[n_species, n_point] d1_raw;
real d2;
real d3;
real mu_d4;
real<lower=0> sigma_d4;
vector[n_species] d4_raw;
real mu_d5;
real<lower=0> sigma_d5;
vector[n_species] d5_raw;
}
transformed parameters{
vector[n_species] b0 = mu_b0 + b0_raw * sigma_b0;
matrix[n_species, n_cluster] b1 = b1_raw * sigma_b1;
vector[n_species] b4 = mu_b4 + b4_raw * sigma_b4;
vector[n_species] b5 = mu_b5 + b5_raw * sigma_b5;
vector[n_species] b6 = mu_b6 + b6_raw * sigma_b6;
vector[n_species] d0 = mu_d0 + d0_raw * sigma_d0;
matrix[n_species, n_point] d1 = d1_raw * sigma_d1;
vector[n_species] d4 = mu_d4 + d4_raw * sigma_d4;
vector[n_species] d5 = mu_d5 + d5_raw * sigma_d5;
real logit_psi[n_point, n_species];
real logit_theta[n_point, n_visit, n_species];
matrix[n_point, n_species] log_prob_increment;
for(i in 1:n_point){
for(k in 1:n_species){
logit_psi[i,k] = b0[k] + b1[k, clusterID[i]] + //b2*sp_cov1[k] + b3*sp_cov2[k] +
b4[k]*pt_cov1[i]; // + b5[k]*pt_cov2[i] + b6[k]*sp_cov1[k]*pt_cov2[i]);
}
}
for(i in 1:n_point){
for(j in 1:n_visit){
for(k in 1:n_species){
logit_theta[i,j,k] = d0[k] + d2*sp_cov1[k] + // d3*sp_cov2[k] +
// d4[k]*pt_cov1[i] +
d5[k]*vis_cov1[i,j];
}
}
}
for(i in 1:n_point){
for(k in 1:n_species){
if(Q[i,k] == 1)
log_prob_increment[i,k] = log_inv_logit(logit_psi[i,k]) +
bernoulli_logit_lpmf(det_data[i,1,k] | logit_theta[i,1,k]) +
bernoulli_logit_lpmf(det_data[i,2,k] | logit_theta[i,2,k]) +
bernoulli_logit_lpmf(det_data[i,3,k] | logit_theta[i,3,k]) +
bernoulli_logit_lpmf(det_data[i,4,k] | logit_theta[i,4,k]);
else
log_prob_increment[i,k] = log_sum_exp(log_inv_logit(logit_psi[i,k]) + log1m_inv_logit(logit_theta[i,1,k]) +
log1m_inv_logit(logit_theta[i,2,k]) + log1m_inv_logit(logit_theta[i,3,k]) +
log1m_inv_logit(logit_theta[i,4,k]),
log1m_inv_logit(logit_psi[i,k]));
}
}
}
model {
//Hyper-priors:
mu_b0 ~ normal(0,10);
b2 ~ normal(0,10);
b3 ~ normal(0,10);
mu_b4 ~ normal(0,10);
mu_b5 ~ normal(0,10);
mu_b6 ~ normal(0,10);
mu_d0 ~ normal(0,10);
d2 ~ normal(0,10);
d3 ~ normal(0,10);
mu_d4 ~ normal(0,10);
mu_d5 ~ normal(0,10);
sigma_b0 ~ normal(0,10);
sigma_b1 ~ normal(0,10);
sigma_b4 ~ normal(0,10);
sigma_b5 ~ normal(0,10);
sigma_b6 ~ normal(0,10);
sigma_d0 ~ normal(0,10);
sigma_d1 ~ normal(0,10);
sigma_d4 ~ normal(0,10);
sigma_d5 ~ normal(0,10);
//Random Effects
b0_raw ~ normal(0, 1);
to_vector(b1_raw) ~ normal(0, 1);
b4_raw ~ normal(0, 1);
b5_raw ~ normal(0, 1);
b6_raw ~ normal(0, 1);
d0_raw ~ normal(0, 1);
to_vector(d1_raw) ~ normal(0, 1);
d4_raw ~ normal(0, 1);
d5_raw ~ normal(0, 1);
//Likelihood (data level)
target += sum(log_prob_increment);