Feedback on response for Edward vs Stan

dustin · March 5, 2017, 11:18pm

Hi all, I often get asked the difference between Edward and Stan—and more importantly when to use one over the other. Below is a response for laymen, with helpful suggestions by Aki. It is meant to be concise (e.g., <4 sentences). And I prefer to leave out nuances so long as it correctly depicts the general idea.

Is this accurate? Revisions are welcome.

What is the difference between Edward and Stan?

Edward is about fast experimentation and research. It is a platform that lets you quickly iterate over new models, new algorithms, and develop and test them all within Edward. The audience is machine learning researchers, industry interested in scalable algorithms, and deep learning. This means Edward tends to be bleeding edge (it uses newer but possibly less reliable techniques).

Stan is about automation and reliability. Given that you specify a model, Stan does all the heavy lifting for you, including comprehensive diagnostics on the accuracy of computation. The audience is applied researchers, consultants, data analysts, and statistical modeling. This means Stan tends to be mature (it uses robust and general-purpose solutions).

andrewgelman · March 5, 2017, 11:37pm

Hi, I think others can help more, but let me say a couple of things:

Stan is bleeding edge too, as I don’t think there’s any faster and more general MCMC algorithm than Nuts as programmed in Stan, and we’ll have Riemannian too.
Both Edward and Stan are designed to allow users to quickly iterate over new models.
In general, Stan makes it easier to try our new models but it’s harder in Stan to play around with the fitting algorithms; Edward makes it easier to try out new algorithms but it’s harder in Edward to play around with models.
A

dustin · March 6, 2017, 2:34am

I wouldn’t say Stan is bleeding edge. (I don’t know what it means to be both bleeding edge and robust.)
I modified the second line; see below.
Edward’s modeling language built on Stan’s and was made to be easier to play around with. For example, it has random variable objects, can be compiled independently of data, can represent only subgraphs (e.g. for data subsampling), can combine inference/criticism (e.g. for Bayesian updating), has a distinction for model parameters (e.g., for GMO), and can be sampled from.

What is the difference between Edward and Stan?

Edward is about fast experimentation and research. It is a platform that lets you quickly iterate over new models, new algorithms, and develop and test them all within Edward. The audience is machine learning researchers, industry interested in scalable algorithms, and deep learning. This means Edward tends to be bleeding edge (it uses newer but possibly less reliable techniques).

Stan is about automation and reliability. It lets you easily specify a model and then does all the heavy lifting for you, including comprehensive diagnostics on the accuracy of computation. The audience is applied researchers, consultants, data analysts, and statistical modeling. This means Stan tends to be mature (it uses robust and general-purpose solutions).

Dustin

andrewgelman · March 6, 2017, 2:50am

Hi, Dustin.
1, I looked up “bleeding edge” and found this definition: “the very forefront of technological development.” I think this describes Stan.
2. You write that Edward allows you to quickly iterate over new models, new algorithms …
Stan does not allow you to quickly iterate over new algorithms but it definitely allows you to quickly iterate over new models. I say this as a Stan user. A key aspect of Stan is how flexible its models can be. Quickly iterating over new models is what Stan is all about.
A

avehtari · March 6, 2017, 9:18pm

I also first mixed bleeding edge and leading edge. Wikipedia has
“Bleeding edge technology is a category of technologies so new that they could have a high risk of being unreliable and lead adopters to incur greater expense in order to make use of them.”
I guess you would not like to describe Stan with this sentence?

Aki

syclik · March 6, 2017, 10:12pm

I’m with Andrew: Stan is bleeding edge. Stan is also (fairly) robust. Bleeding edge and robust are two orthogonal dimensions. Trying to say they have to be mutually exclusive is just wrong. My bet is that most of the time, most things are neither bleeding edge nor robust.

The way I see it, the Stan Language was designed for model iteration. The underlying C++ was designed for algorithm development, but most of our community never touches this layer. We’ve had a few people build algorithms around this layer: ADVI with Alp and @dustin, some other VI approach by @rgiordan. The abstraction at this level is a single, joint log probability distribution function up to an additive constant and this provides the function value and gradients. Then there’s the C++ math library which is just for computing lots of different functions and autodiff.

In the Stan Language and at the C++ layer, there’s no graphical model. It’s clear when you see the C++: there’s just a single function. There’s no dynamic graph. There’s no structure that’s encoded. The abstraction is the function where data and parameters are passed in, the function value is evaluated with the option to compute the gradients.

I don’t know what Edward is good for, but if you’re trying to pinpoint what Stan is good for, I think it’s 1) iterating over models for the end user and 2) for an algorithm developer, having access to a log joint probability distribution function (up to an additive constant) where the function value and the gradients are available.

andrewgelman · March 6, 2017, 10:49pm

+1 to Daniel’s comment.

avehtari · March 7, 2017, 8:24am

Based on the Wiki and discussions elsewhere, there seems to be many many people who strongly disagree that they are orthogonal. I don’t think it’s possible to change that opinion, and we just have to use some other term than bleeding if we want to say that Stan is fairly robust.

In addition that Wikipedia says “have a high risk of being unreliable and lead adopters to incur greater expense in order to make use of them” it list the following criteria

Lack of consensus – competing ways of doing new things exist and there is little to no indication in which direction the market will go.
Lack of testing – The technology may be unreliable,[2] or simply untested
Industry resistance to change – trade journals and industry leaders have spoken against a new technology or product but some organizations are trying to implement it anyway because they are convinced it is technically superior

I don’t think Stan would like to be associated with any of these criteria.

Would you be happy with State of the art or Cutting edge (Wikipedia’s suggestions).?

For a layman explanation your 2 is formulated in very technical terms. How would you explain that in non-technical way?

Aki

dustin · March 7, 2017, 9:07am

Bleeding edge seems to mean different things to different people. I’ll just drop the term.

@andrewgelman

You write that Edward allows you to quickly iterate over new models, new algorithms …
Stan does not allow you to quickly iterate over new algorithms but it definitely allows you to quickly iterate over new models. I say this as a Stan user. A key aspect of Stan is how flexible its models can be. Quickly iterating over new models is what Stan is all about.

The nuance added to the latest revision is only due to my writing abilities. Revisions welcome.

@syclik:

and 2) for an algorithm developer, having access to a log joint probability distribution function (up to an additive constant) where the function value and the gradients are available.

I don’t think Stan is good for algorithm developers. E.g., I/O interfacing has been a problem, and there’s too much infrastructure in place to quickly prototype and iterate over algorithms when you don’t already know how to spec it out. To enable fast experimentation requires that one can start from an idea, and prototype and iterate on it from scratch. (in the same way one might iterate over new models) With ADVI we had a good idea of how to design it via older work on BBVI; I don’t know about @rgiordan but I suppose he had experience with LRVB from previous experiments.

Of course, this is fine too. As you and Bob argued 4 years ago, Stan should favor “slower development with an emphasis on stability and a stricter process”. Andrew argued “being a methods development tool is one of the important roles of Stan”, but all choices make tradeoffs.

syclik · March 7, 2017, 2:40pm

It’s changed recently. Now it’s really easy. Here’s ADVI:

github.com

stan-dev/stan/blob/develop/src/stan/services/experimental/advi/fullrank.hpp#L50


      
          * @param[in] adapt_iterations number of iterations for eta adaptation
          * @param[in] eval_elbo evaluate ELBO every Nth iteration
          * @param[in] output_samples number of posterior samples to draw and
          *   save
          * @param[in,out] interrupt callback to be called every iteration
          * @param[in,out] logger Logger for messages
          * @param[in,out] init_writer Writer callback for unconstrained inits
          * @param[in,out] parameter_writer output for parameter values
          * @param[in,out] diagnostic_writer output for diagnostic values
          * @return error_codes::OK if successful
          */
          template <class Model>
          int fullrank(Model& model, const stan::io::var_context& init,
                      unsigned int random_seed, unsigned int chain, double init_radius,
                      int grad_samples, int elbo_samples, int max_iterations,
                      double tol_rel_obj, double eta, bool adapt_engaged,
                      int adapt_iterations, int eval_elbo, int output_samples,
                      callbacks::interrupt& interrupt, callbacks::logger& logger,
                      callbacks::writer& init_writer,
                      callbacks::writer& parameter_writer,
                      callbacks::writer& diagnostic_writer) {

Focus on the abstraction. The model and an initialization is provided. All an algorithm developer has to do is utilize that. And if you’re unclear on how to drive it, this test shows you how:

github.com

stan-dev/stan/blob/develop/src/test/unit/services/experimental/advi/fullrank_test.cpp#L33


      
          int grad_samples = 1;
          int elbo_samples = 100;
          int max_iterations = 10000;
          double tol_rel_obj = 0.01;
          double eta = 1.0;
          bool adapt_engaged = true;
          int adapt_iterations = 50;
          int eval_elbo = 100;
          int output_samples = 1000;
          
          stan::services::experimental::advi ::fullrank(
              model, context, seed, chain, init_radius, grad_samples, elbo_samples,
              max_iterations, tol_rel_obj, eta, adapt_engaged, adapt_iterations,
              eval_elbo, output_samples, interrupt, logger, init, parameter,
              diagnostic);
          
          EXPECT_GT(logger.call_count(), 0);
          EXPECT_EQ(logger.call_count(), logger.call_count_info())
              << "all messages go to info";
          
          EXPECT_EQ(1, logger.find_info("EXPERIMENTAL ALGORITHM"))

In the past, it was a bit harder, but all the pieces were still there. It’s now just easier to follow along. In the past, all the algorithm developers we knew about just reached out to us and it was easy for us to show how to do it, but you’re right… we never went out of our way to document it so it was super-easy.

That’s out of context. For Stan, I completely agree. For a feature to get into the code base, it should be stable. There’s a lot of maintenance to contend with – even if the code is as perfect as it can be.

For experimenting, all gloves are off. Go forth and develop algorithms in sandboxes and prove to yourself that it works. Be fast about it. Be as robust as possible. But at the point someone wants it into the code base for distribution, it’s a commitment.

syclik · March 7, 2017, 2:54pm

You missed a key part of what wikipedia said: “they could have a high risk of being unreliable…”

Think back: BFGS was bleeding edge at some point. L-BFGS was bleeding edge, maybe still is? (I haven’t checked the optimization literature in a while.) Stan’s still progressing faster than a lot of other projects out there. We’re still pushing on ODEs, on math, on algorithms, on speed, on organizational structure, on UX, on higher level abstractions… so as far as I can tell, we’re still bleeding edge.

Stan isn’t completely robust. There are problems that will make Stan fail. There is definitely an industry resistance to change. We definitely haven’t tested every aspect of everything. But Stan is robust a class of problems, which seems to be pretty large.

avehtari · March 7, 2017, 4:09pm

I have all the time agreed completely with this…

and we are just disagreeing what people think “bleeding edge” term means.

Aki

syclik · March 7, 2017, 4:24pm

+1. I really don’t care… the only thing I do care about is being represented fairly. I’m happy if the terms bleeding edge, state-of-the-art, and whatever other marketing terms are taken away.

avehtari · March 7, 2017, 8:29pm

Heh :) Everyone in this discussion cares about representing Stan fairly and that’s why this discussion was started. Representing Stan fairly turned out being more difficult than I thought because people have so strong but different opinions on what certain words mean.

Daniel: Instead of we try to guess which adjectives you think are marketing terms, could you make your fair version for Stan, with about 4 sentences and less than 60 words, please?

Aki

Topic		Replies	Views
Edward vs. Stan performance General	17	5755	September 13, 2019
The key difference, if any, between Stan and Deep Learning--argument for paper General	1	1167	October 25, 2021
Bayesian pharmacometrics modeling with Stan and Torsten preprint Events	11	1905	October 1, 2022
Random Question: Stan Model vs EM Algorithm General	3	1881	June 15, 2018
"Selling" Stan General	14	5359	April 7, 2018

Feedback on response for Edward vs Stan

Related topics