Compiler Optimization Hints?

I’m currently trouble shooting a relatively large multi-state mark recapture model. As the model has grown and especially since updating to R 4.0.0, I’ve been getting longer and longer model compilation times. This is now a problem as I’m trying to debug the likelihood calculation using print statements to identify where I’m accidentally asking Stan to calculate log(0). There are multiple points where this might happen, but it is way too cluttered to put print statements all over the place at once. So I’m attempting to do this iteratively by inserting a print statements to check certain parts of the likelihood piecemeal. However, my compilation times are now absurdly long making this essentially impossible. We’re talking on the order of twenty minutes long.

I understand that setting CXX14FLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function is supposed to result in better Stan model runtimes, but I suspect that this at the cost of longer compilation times.

Can someone smarter than me please give me a primer on optimizing compilation time versus optimizing model runtimes using CXXFLAGS or the like? I thought his might contain hints: https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html#model-compilation, but no such luck.

Ideally I’d have one set of optimization options for troubleshooting and model building and a second set for fitting clean models. But I don’t know how to do that currently and I haven’t found a good explanation anywhere in the twenty minutes of searching I did while waiting for this model to compile.

For what its worth. I’m on Windows 10 with R 4.0.0, Rtools 4.0, rstan 2.19.3 and StanHeaders 2.21.0-3

Do

CXX14FLAGS=-O0 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function

for fastest compile times (and terrible runtimes).

I actually tried that. Weirdly it took even longer to compile. I quit after waiting an hour for it to compile and wound up taking the strategy of commenting out a ton of my code to make it compile faster.

Working with Stan sometimes feels like equal parts joy and misery. Thanks for you help.

If you just do -O0 you are telling the compiler to not do any optimizations. mtune and march I think will still turn on some optimizations. When you are compiling the model are you sure it’s respecting the CXX14FLAGS values and not pulling them in from somewhere else?

A 20 minute compile is a bit out of the norm for Stan (even with flags like -O3), so I would suspect there is a larger issue than compilation flags. Is this for all Stan models or just the specific model you’re trying to debug?

For example, what kind of a compile time do you get from the rstan example model:

example(stan_model, run.dontrun = TRUE, verbose=TRUE)

That takes 7.76 seconds with -03

ptm <- proc.time()
example(stan_model, run.dontrun = TRUE, verbose=TRUE)
proc.time() - ptm

user  system elapsed 
1.11    1.70    7.76 

The long compilation times are for this particular model. They have increased as the model has grown. There is ALOT of matrix manipulation and multiplication. It’s an extension of this model: https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13171

Model code is in the supplementary materials of the linked.

have you tried using CmdStanR? http://mc-stan.org/cmdstanr/
this is using Stan 2.23, while RStan is at 2.19(?)
most users have reported much faster compile times.

see: http://mc-stan.org/cmdstanr/reference/model-method-compile.html

I haven’t. Mostly got locked into rstan at first adoption. I will check it out.