Compilation speed and internet connection (cmdstanr)

Hi,

I have noticed that model compilation in cmdstan takes ~10 times longer when I am connected to my work wifi network. I am a bit confused, because I assumed there should be no link between being connected to the internet and model compilation process. However, the empirical difference is ridiculous. Does anyone know why this might be happening?

After cmdstan is installed and built for the first time there should be no internet access required, so this is certainly odd!

Are you comparing to some other wifi network? If so, have you measured when not connected to the internet at all?

Yes, I am currently simply disconnecting from the internet whenever I need to compile a model.

To answer your question, I just tested it by connecting to my mobile data, and it’s as fast as offline, so it seems only my work wifi slows it down considerably… I imagine it has something to do with security, but I am very puzzled how is it possible and whether there is any way around it, apart from disconnecting each time I compile.

This is puzzling. The only thing the cmdstan build process shells out to the internet for is to download the stanc3 compiler in development builds. In a released CmdStan, it shouldn’t connect at all.

You can try disabling this ability entirely by passing STANC3=some/path/that/exists/ to make. This will tell it to try to build stanc from sources found in that folder – which you don’t have or need, but it completely disables the download ability, so it should give us good information either way.

The other thing I’ve thought about is certain antivirus software is very suspicious of compilations. Touching a lot of files and writing out new ones looks a lot like some kinds of ransomware. Is it possible that connecting to your work wifi is enabling some extra protections?

Yes, this seems very likely. Since it’s a governmental institution, the security is pretty ridiculous. I’ll drop a query to the IT team about this, cause I assume I can’t really do much about this on my own. Thanks!

I’m encountering this issue, and the accepted solution doesn’t appear to resolve it.

  • Operating System: Windows 11

  • IDE: RStudio

  • Interface: CmdStanR

  • CmdStan: 2.36 installed using cmdstanr::install_cmdstan

I’m running the following code to test model compilation:

Sys.setenv(STANC3=ā€œc:\ignore_meā€)

library(cmdstanr)
file ← file.path(cmdstan_path(), ā€œexamplesā€, ā€œbernoulliā€, ā€œbernoulli.stanā€)
system.time({
mod ← cmdstan_model(file,force_recompile = TRUE)
})

I work remote and my company uses Zscaler for network security and VPN. If I disconnect from the internet the performance improves significantly.

Internet Enabled STANC3 Set Compile Time
Yes Yes 103 sec
Yes No 104 sec
No Yes 36 sec
No No 40 sec

As you can see, setting the STANC3 environment variable appears to have minimal impact on compilation time.

If there’s an alternative way to pass this variable to the make utility—particularly when using the cmdstan_model() function—I’d appreciate any guidance. I wasn’t able to find a documented method for doing so.

On my home computer, this compiles in about 15 seconds. So, there may also be some anti-virus software issues which I will address with the security separately.

While monitoring the output during compilation, the slowdown seems to occur both before the C++ compiler starts and slightly afterward.

Could you elaborate on what processes are happening behind the scenes during these phases? Understanding this would help our security team determine how best to allow the necessary connections without introducing security risks.

After installation, building a model should invoke only 3 processes:

  1. stanc, the Stan to C++ compiler
  2. A C++ compiler to create an object file
  3. A C++ compiler to link that object file with some files built during installation, to create the executable