Compiling CmdStan to https://webassembly.org/ - how to make "one large" C++ file for a model, a C++ file that contains "everything needed", including Stan and Math routines

Jozsef_Hegedus · November 4, 2019, 12:57pm

also, i am wondering, why is it so that the data itself has to be baked in into the generated executable ?

well, anyway… i am sure there are reasons for this … mostly software engeneering reasons i believe … C++ is not the easiest language to implement automatic differentiation in … where the “stuff” lives in a “monad” and can be changed at runtime … ok …

nhuurre · November 4, 2019, 1:04pm

It isn’t. E.g. my wasm-stan discussed in the other thread works the same way CloudStan does; data is passed through file upload API.

Jozsef_Hegedus · November 4, 2019, 1:17pm

OK, so, cmdstan produces a C++ code that does not depend on the data itself ?

Is that correct ?

So, no matter what the data is, the generated C++ file is always the same ?
Or is it baked into the generated C++ file?

Is it possible to create a C++ files from a Stan code (model), where the data is read from the file system by the binary which is compiled from the generated C++ code ?

I was under the impression that data is baked into the generated C++ files, and that’s the end of the story, so apparently that is not the case ?

Jozsef_Hegedus · November 4, 2019, 6:06pm

Hm…, so basically, by web you mean server+client ? Or client only ?

I mean, if the .js file generation takes place on the server, then the data generation can work on the client / browser. This way the server would not be overloaded.

So, the really question is, is it possible to supply data to an already generated .js (describing a model) to sample data from - without the need to contact the server and hard-code the data into the generated .js file.

I think this is your current solition on github, afaik.

So, yes, indeed. Pre compiled Stan models into .js, where ppl can put their own data and see what comes out (without contacting any server), stuff runs entirely in the web browser.

I don’t really know what kind of sandboxing issue are you talking about ?

If I had a .js file with a model, could I add extra data to it ? Say, for example using Node.js ?

I mean, I don’t think it is difficult to compile a C++ file to .js using em++ that gets data pushed into it and prints it to the screen.

How is Stan different from this ?

Do you want to upload data into a javascript object ?

From the harddisk ?

Or what is the “sandbox” here ?

Jozsef_Hegedus · November 5, 2019, 12:45am

ok, it seem that Stan can be fully ported to the browser :

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/emscripten-discuss/wJGJuJhXkdo/veJbozBJDwAJ

https://binji.github.io/wasm-clang/

I just don’t really see the WASM here :

everything seems to be .js …

Jozsef_Hegedus · November 6, 2019, 7:06am

This is a full blown c++ compiler :
https://tbfleming.github.io/cib/

Jozsef_Hegedus · November 6, 2019, 8:14am

Ok, interesting to know ! Thanks for the info!

nhuurre · November 6, 2019, 8:44am

huh, so it is possible? Although (1) it has not been updated since May 2018, (2) does not handle expections, and (3) all output I see is “Runtime error: Index out of bounds”. But hey , maybe it works? I still think Stan interpreter is the easier way to go but it’s not like I know anything about these things.
By the way, did you see this:

Jozsef_Hegedus · November 6, 2019, 9:51am

Well, I just wrote a scala.js react component today where i need to upload a picture to a server view a HTTP request encoded as base64, so if I can do that … then I can read in the data by selecting actively what it is ?

Or what kind of interface do you want to transfer the data over ? OS to Browser ? or Browser to Compiled stan.js / stan.wasm ? In the second case, since the compilation takes place in the browser, and the browser can remember the selected data file, and it’s contents. Which can be placed somewhere … such that the stanc.js / stanc.wasm has access to it while it is generating the C++ code from the Stan code (which - in my app - of course - will be generated by Scala code … LOL - just to use the most sophisticated language on the planet, which generates maybe one percent of the world’s GDP ( A very optimistic thought - OTOH - python won’t cut it in the cloud… even DrobBox is going bankrupt because of python… imperative languages are not compositional and that is a huge problem… if you have a choice… where you can suddenly reimplement Stan in pure scala in two years by a small FP community … https://github.com/stripe/rainier in a way that it won’t be difficult to massively parrelellize it since the computations are also composable, first class citzens that can travel along the streams - like in this blog https://darrenjw.wordpress.com/2017/04/01/mcmc-as-a-stream/ )

I cannot tell if I am delusional or not, but if there is a 10 times better technology for certain tasks then that will overtake the current ones … and once it comes to async computations, atm, Scala is the only choice… hence Spark and it’s friends are now written in Scala … and if you want to cut costs on composing programs, then you better have a sophisticated type system, so, again, python, R can be thrown out of the window.

The Stan community is much more active than the Rainier but technology wise (https://darrenjw.wordpress.com/2018/06/01/monadic-probabilistic-programming-in-scala-with-rainier/) , sooner or later, FP (and all the good things that come with it) will win, for many reasons, that maybe you already all know.

So, food for thought… maybe it is time to start to think about ditching C++ and replacing it with Scala (that is the most widely adopted FP language). There is also Haskell and ML but… and Idris… but it seems that the ML/AI world, especially big data / cloud world likes to use Spark a lot, which is written in Scala, and they are just starting to scratch the surface of the story here…

Just have a look at Rainier and I would not be surprised if you would start to seriously think about switching to Scala, just look at the code that does the symbolic differentation : https://github.com/stripe/rainier/blob/develop/rainier-core/src/main/scala/com/stripe/rainier/compute/Gradient.scala

I am not a super good FP coder, but the ones who are can re-implement Stan using FP from scratch in a few months, I mean, what do you need ? A symbolic differntiation and RX ( http://reactivex.io/ ).

Because the problem with C++ is that it is not functional and not compositional, and it does not take you 2 minutes to write a Monad out of the blue that solves some particular problem, because 1) the language does not support it, 2) the libraries do not support it, 3) the books from which you learn how to solve problems do not support it.

The problem is that you want the Bayesian “science” and “research” to continue to grow and evolve, but I am not sure that in the age of distributed computing a language that was designed for Von Neumann architectures will be competitive - and it’s better to realize it now, than ten years from now… all these things I am talking about are already generating/saving lot of Gs… I think it might be worth for science to catch up too.

To the best of my understanding, ScalaStan was originally created because compositional (and “DRY” and Statiacally typed) Bayesian modelling was very useful (or say, even a must - in my subjective understanding) for the author. I can totally resonate with that.

C++ was the only choice 15 years ago, but it will not be an option 10 years from now, the latest. Maybe it’s time to use a programming language that is 10 times more efficient for these kind of problems already today, why ? Because they do exist and are used to make lot of money (distributed computing / compositional modelling).

What I am saying here goes well beyond Stan, I think it will effect the whole AI story… deep learning and way beyond … data and algorithm streams are all over the place … Von Neumann architecture is not cutting it anymore, it was never meant to cut it. No wonder the brain is the opposite of the Von Neumann architecture, if you want to simulate the brain on a Von Neumann machine then you better build a nuclear power plant first for it, not just plant a few apple trees.

Ok, bottom line, reactive streams, monads, interpreters, abstract syntax trees are on its way and are already havily used in industry for years…

I think, that I am not saying anything new to anyone here, because I am not such great expert in ML, but for me, this is a new idea. The idea, that the change has already started and most importantly it won’t stop, IMHO. But after thinking about this, it is actually trivially true, but before… it was not. Just like with the transistor… a simple thing, but somebody had to invent it :).

So, my answer, to how to make Stan “go faster” is to throw out the C++ implementation and reimplement the know-how in Scala using a totally different, future proof architecture, based on FP principles, and on “all the good stuff”, you know all of it… I am pretty sure, this is nothing new, I am no big expert here, I know that, but I was a bit shocked by the revelation that this thing is happening much much faster than I envisioned.

Let’s take the brilliant ideas from Stan and transplant them into future technologies - otherwise somebody else has to reinvent the wheel and put the wheel into future technologies, and those wheels will be really creappy wheels, behind shiny new technologies, and that will be a sad story to cry about to your psychiatrist robot 20 years from now.

LOL :)

J.

OVER AND OUT for a few days.

Jozsef_Hegedus · November 6, 2019, 11:53am

what is stan3 ? what language is stan3 ? ocaml? or what ?

Jozsef_Hegedus · November 6, 2019, 3:39pm

I gave this Rainier a quick try too. Works out of box:

what is interesting in the above example is that the value of the predictor expression in the
monadic chain (for comprehension), is not used at all…

so looking at this code, and how it works, it needs a bit of thinking, even after 2 years of full time scala coding … pretty tricky stuff … full on mind bender, but it seems to work, and it seems that the people who wrote it have some tricks up their sleeves…

here is the result :

I think the future of HMC is going to be something like this … due to purely
software engineering point of view…

writing good and complex FP code is 1000 times easier than writing doing the
same in C++ … or python … or you name it…

But, I think, you are guys pro-s,so I am not saying anything new here, just that the heavy competition is coming on surprisingly strong… with the best tech stack to back it up… it is just the beginning… and it
is impressive piece of code…

no need for documentation

the source code is enough…

ok, i might be able to compile directly rainier to javascript … maybe i look into that at some point …

i am really surprised that people have pulled this off…

clever piece of code

solid FP architecture… fast…

the scala wrapper to Stan can only go so far…

if i want to extend or modify the source… provide my own special custom sampler
etc… no scala wrapper will do that …

i think this is pretty good news for Stan people… they don’t have to start to re-write Stan
from scratch… they can get some inspiration from Rainier …

it looks a solid piece of code, at first glance… pretty solid

OVER AND OUT, watch out for the FP, it’s coming.

Cheers,

Jozsef

tpapp · November 16, 2019, 9:59am

FWIW, I have a functional-style implementation of NUTS written in Julia,

I find it much easier to unit test than code with mutable state. (Julia is not purely functional though, but allows a functional programming style).

Jozsef_Hegedus · November 16, 2019, 5:17pm

Well, yeah, I am no Julia expert but as far as I know it is a Lisp dialect.

I never really used non static typed functional languages too much, but yeah, shooting
onself into the foot using non functional languages nowadays is a luxury.

Functional is already a good step forward, I also like static typing, the compiler does
lot of the work me. Haskell style, becomes possible soon in Scala https://typelevel.org/cats/ .
This is a new thing, but in practice, this brings Scala closer to being a “Haskell”.

I can read / write Scala and I have seen the Rainier code and it’s impressive what they can do
from scratch in Scala. I mean, full symbolic differentiation, composable everything, basically
ScalaStan - written purely in Scala even the Stan part… no compiler, no nothing required. Runs in JVM, the compiler is written in Scala itself. They wrote the compiler itself too, in Scala, it compiles to Scala, right away, the power of functional programming, I think it is called free monad, the simplest version, you can write a compiler yourself in a few lines of code for some simple language you define, I think they must be using something similar, not sure though. Just a few buzzwords :) .

Functional is good, static typing is good, what else needs to be said ? Monix, probably. Just a guess :). The cloud is coming and I don’t think there is anything better for the cloud than Monix… ATM … when it comes to functional programming - in the cloud - it’s actually a sort of “functional reactive programming” :) - a bit overloaded term - still. I would not be surprised if it would not be too difficult to make a cloud friendly version out of Rainier.

Again, these are purely technical, software engineering related questions. Cloud is coming, big data is coming, async is coming, FP is coming. Times are changing. Technologies are evolving. C++ can only go so far… mutable state can only go so far… half of ML-in-cloud is running on Scala, nowadays, if not more than half. Not without a reason. It’s cheap to implement software in these new technologies. Spark and their friends. A year ago I was like - maybe … Scala will get more popular in the ML world … today… OK, what else will have a chance to survive ? Time is money, companies who invest into ML want results fast and cheap and they use the best tool to get the job done… even if the learning curve is high, apparently it still pays off to use Scala.

I am surprised myself. I learned Scala for fun, not for profit, and now it seems that industry finds it the most economical choice to “do” ML in.

There is no competitor. No other langauge, I see no reason why this trend would not continue. Something to watch out for.

A purely technological point of view - when considering the future of Stan - I don’t think it will survive in C++ - at least not for big data - in the cloud. Which would be a shame - there is lot of intellectual value now in Stan that could and should reincarnate in a new body - a body that can evolve - and will have a chance to survive.

this reminds me of a quote :

"… We shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender, … "

Topic		Replies	Views
Translating stan model to C++ code (how to inline "everything") - and related efforts using em++ to get Stan running in the browser Developers	14	1503	November 4, 2019
Problems using translated stan model in C++ CmdStan cmdstan	4	745	November 16, 2021
Compiling error cmdstan example code CmdStan	14	1333	November 10, 2023
Use Stan easily from c++ Interfaces c++api	25	7275	May 16, 2018
Emscripten interface to model Project Proposals	5	1030	May 28, 2018

Compiling CmdStan to https://webassembly.org/ - how to make "one large" C++ file for a model, a C++ file that contains "everything needed", including Stan and Math routines

Related topics