Tutorial: How We Productized Bayesian Revenue Estimation with Stan


We use Stan in production to estimate the revenue distributions in online marketing. We wrote a tutorial type of blog post about our modelling and experiences in Stan:


What is the reason for using log normal for revenue? I’ve typically seen gamma or Pareto-II used here. Gamma is also let’s you avoid the Fenton Wilkerson approximation.

Aside for the 0 bound on SD params you shouldn’t need to bound the parameters. If you are getting crazy values, it probably means your priors aren’t strong enough.

I think gamma is preferred over inverse gamma for prior on variance, but this is debatable

Set your cflags to -O0 and without -g while debugging, it compiles much faster and reduces the iteration cycle for finding simple bugs :)


Thanks, good comments! Using Gamma instead of log-normal is a good point. Log-normal with the Fenton Wilkerson worked well enough, so we didn’t really evaluate others options.

Regarding the bounded variables, stronger priors could indeed help. We mainly added those to get hard to enforce the practical min/max limits that have observed from real data to avoid that the automated production runs would produce any weird results. But with strong priors it should not be a problem eiter.

And thanks for the cflag tip! It should be the default in interactive mode. :)


Thanks for the nicely done writeup. I am new to the Stan world and Bayesian analysis but thought I would link to a talk by Andrey Munoz Medina for setting the reserve on ad auctions at Google. No idea if this is helpful but thought I would share. Sorry for the Facebook link but it was an @Scale event hosted by Facebook.



Thanks much for sharing. We love to see this kind of thing—especially the fact that you’re putting Stan it into production.

I’m not sure if PyStan/Python supports this, but with RStan you can create an R package (like our own RStanArm) that precompiles the Stan models and distributes them as part of the package. We’re getting more people interested in PyStan, so it should be getting more attention going forward on the interface. Let us know if there’s something you particularly want to see. We’re partly driven by what our users want and partly by what our developers want to build.

We’re happy to help if you (or anyone else) wants feedback on programming Stan models. Some quick thoughts on the model that you posted:

  • rather than log(1 + exp(x)), we have a built in log1p_exp(x) function that should be a bit faster and more stable for negative x
  • we don’t recommend interval priors because they can seriously distort posteriors if any of the mass would otherwise be close to or past the boundaries; so while they’ll prevent going over the boundary, they may cause pileups at the boundaries that can cause true posterior uncertainty to be underestimated
  • the gamma/inverse gamma priors can be problematic with pooling because they exclude zero and hence prevent very strong pooling; Stan doesn’t use conjugacy of any kind, so you can use any kind of prior.


Hi, just to respond on the lognormal thing: Sure, maybe a gamma or whatever is better. But in my experience with regression models,and the most important thing is including relevant predictors and interactions, the next most important thing is regularization–that goes with including relevant predictors, as with regularization we are more free to add additional predictors, in the same way that a trapeze artists can do cooler tricks in the presence of a net. In my experience, the distribution of the residual errors is not the most important thing. So, sure, think about different options, but keep eye on ball.