Problem compiling models with cmdstan on arm64

I am trying to setup cmdstanr with cmdstan on my Android phone with a proot enviroment in Termux, the OS is Ubuntu 20.04.

When I run install_cmdstan(cores = 8, timeout = Inf, cpp_options = list(“CXX” = “clang++”)) everything looks fine until the final step:

--- Translating Stan model to C++ code ---
bin/stanc  --o=examples/bernoulli/bernoulli.hpp examples/bernoulli/bernoulli.stan
make: *** [make/program:53: examples/bernoulli/bernoulli.hpp] Bus error

Any idea what could be wrong?

This might be the cause of the error:

The CmdStan toolchain is setup properly!
* Latest CmdStan release is v2.25.0
* Installing CmdStan v2.25.0 in /root/.cmdstanr/cmdstan-2.25.0
* Downloading cmdstan-2.25.0.tar.gz from GitHub...
* Removing the existing installation of CmdStan...
* Download complete
* Unpacking archive...
* Building CmdStan binaries...
cp bin/linux-stanc bin/stanc

The stanc binary is probably not arm64/aarch64. :)

EDIT: stanc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped

One possible hack would be to run it through qemu. Any other suggestions?

That may be the cause yeah. Will see if we can build a binary for that, should be doable.

1 Like

Made a bash script which I compiled into a binary with shc:

#!/bin/sh

qemu-x86_64 /root/.cmdstan/cmdstan-2.25.0/bin/stanc_x86_64 "$@"

with the earlier stanc binary renamed as stanc_x86_64, and then the compiled bash script renamed to stanc. EDIT: Path edited.

Trying to compile once again, and I get this:

> cmdstanr_example()
Compiling Stan program...
make: *** [make/program:53: /tmp/RtmpE57fTx/model-59645f9ebd62.hpp] Error 1
Error: An error occured during compilation! See the message above for more information.

I should also add that compilation with RStan works fine.

The problem here is stanc3. I am guessing if you put

STANC2=true in make/local of the cmdstan things will also work because in that case everything is compiled locally.

I messed up the path to “stanc_x86_64”, in my case it should have been

/root/.cmdstanr/cmdstan-2.25.0/bin/stanc_x86_64

and now it works!

> cmdstanr::cmdstanr_example()
Compiling Stan program...
clang: warning: argument unused during compilation: '-arch aarch64' [-Wunused-command-line-argument]
 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
  lp__    -65.93 -65.59 1.45 1.20 -68.79 -64.29 1.00     2114     2395
  alpha     0.38   0.37 0.22 0.22   0.02   0.73 1.00     4242     2855
  beta[1]  -0.66  -0.65 0.24 0.25  -1.07  -0.28 1.00     4498     3247
  beta[2]  -0.28  -0.28 0.22 0.22  -0.64   0.08 1.00     4373     3293
  beta[3]   0.68   0.67 0.27 0.27   0.25   1.14 1.00     3890     3044

I should remove the flags though. :)

No need to use stanc2, @rok_cesnovar!

EDIT: Of course it would be smarter to use the linux-stanc binary instead of renaming the copy. Just don’t forget to set the permissions to allow it to run (chmod +x linux-stanc).

1 Like

Oh, so you just need to emulate. Nice!

Thanks for this insight. We should think about adding this to the cmdstan makefiles somehow.

2 Likes

And it’s fast!

Smartphone specs (OnePlus 8T):
Chipset: Qualcomm SM8250 Snapdragon 865 (7 nm+)
CPU: Octa-core (1x2.84 GHz Kryo 585 & 3x2.42 GHz Kryo 585 & 4x1.8 GHz Kryo 585)

> tic();brm(data = lme4::sleepstudy, formula = Reaction ~ Days + (1 + Days|Subject), cores = 4, backend = "cmdstanr") -> model;toc()

Compiling Stan program...
clang: warning: argument unused during compilation: '-arch aarch64' [-Wunused-command-line-argument]
Start sampling
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)         
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1 finished in 2.2 seconds.
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4 finished in 2.2 seconds.
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3 finished in 2.5 seconds.
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2 finished in 2.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 2.4 seconds.
Total execution time: 2.7 seconds.
25.813 sec elapsed
Mean chain execution time: 2.4 seconds.
Total execution time: 2.7 seconds.
25.813 sec elapsed
4 Likes

this is so cool!!!

what are you using Stan on your smartphone for?

2 Likes

Nothing in particular really, not yet anyway. More or less just for fun, but since the phone is quite powerful it might end up being useful.

I am currently trying to get RStudio Server to work (compiled and installed, but some permission issues on the Android level stops it from running) for being able to use the phone as a portable computing machine, or something. I have succesfully compiled the latest R version and got it up running as well.

2 Likes

I made an issue at the GitHub repo regarding this.

2 Likes