Baking a cake - how to find the perfect recipe?

Let’s say I’m baking a cake. I want to find the “perfect” recipe. I specify a model using the weight percent of each ingredient and cooking conditions as model inputs. Tastiness is the outcome.

cake_data <- read_csv("test_cakes.csv")

brm(tastiness ~ (flour + water + butter + sugar + eggs)^2 + 
                 cook_temp + cook_time,
        data = cake_data)

Let’s assume this is a properly specified model and it fits well in stan. How can I identify the best recipe for given cook_temp, cook_time, and tastiness?

The only thing I’ve been able to come up with is to create a grid of possible recipes and pass that to posterior_predict. What am I missing? What would I do in the case of many more ingredients where creating that grid is no longer possible.

As I’m sure is obvious, I’m new to modeling with stan. I’m using {brms} and {rstanarm} if that changes how you might answer the question.

2 Likes

In general, if you have a quadratic objective function, you can calculate the conditions for the maximum.

Let’s say if y = \textrm{tastiness}, x_1 = \textrm{flour}, x_2 = \textrm{water} and

y = \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1 x_2 + \beta_{11} x_1^2 + \beta_{22} x_2^2

First derivatives,

\begin{aligned} \frac{\partial y}{\partial x_1} &= \beta_1 + \beta_{12} x_2 + 2 \beta_{11} x_1 = 0 \\ \frac{\partial y}{\partial x_2} &= \beta_2 + \beta_{12} x_1 + 2 \beta_{22} x_2 = 0 \end{aligned}

Some rewriting can then give you the optimal levels for x_1 and x_2 as a function of the \beta's. You see from the system of equations that the amount of x_1 and x_2 depends on each other. That is consistent with recipes typically using ingredients in fixed proportions.

I don’t use brms so I am not sure how you can use posterior predict to generate the optimal level but it is possible in generated quantities within Stan. If you can add generated quantities to brms you should be able to do this.

The maximum of quadratic form for n variable is studied intensively in linear algebra. It leads to a system of n linear equations which is relatively easy to solve. Let me know if that is the direction you want to go.

Thanks for the reply. This helps.

I understand for this example, but what would you do in a case that the model form is too complex to enumerate the equations to describe the relationships among the inputs?

this is a really beautiful question, the answer to which is perhaps the Bayesian workflow and some kind of multi-level model. it’s on you to formulate the tastiness-generating process, but for some insight into the number of variables, I suggest this example: I Tried 4 Famous Pound Cake Recipes - Here's the Best | Kitchn

like a winning basketball team, a tasty cake is far more than the sum of its ingredients. as there’s a lot of great sports analytics out there, I would suggest this: Hierarchical Modelling in Stan: Predicting the Premier League - YouTube

5 Likes

I think in general you could do something with optim in R, where you have it maximize the posterior mean of tastiness by changing the covariates. It would be a bit slow but should be doable. Of course, if you want to be more conservative / un-tasty-averse, you could use a posterior quantile like 25%, or first take e.g. the log of tastiness & then the mean.

So something like

optim(c(1, 1, 1, 1, 1, 150, 20), function(covs) {
    d = data.frame(flour=covs[1], water=covs[2], ...)
    -mean(posterior_predict(model, newdata=d))
}, lower=<bounds here>, upper=<bounds here>, 
control = list(reltol=1e-3, abstol=1e-2))

This is is not going to be the most rigorous answer but it might be a good prototype.

  • If you can make the case that the quadratic form is a decent enough approximation of the “true” objective function. For the recipe and my social science problems this would be fine.
  • If you are comfortable with some linear algebra
  • If you can move to pure Stan

The objective function can then be written as

y = Xb - .5 XAX^T

with y the vector of tastiness obervations, X the matrix of ingredient observations, b and A a vector and matrix of parameters to estimate.

The optimal amount for the ingredients, x, are given by

Ax = b or x = A^{-1}b. (if A is invertible which you can enforce in Stan by making A a covariance matrix).

The linear algebra capabilities in Stan make it quite easy to specify this type of model.

  • Use the objective function to calculate the mean tastiness per observation.
  • Add normal measurement error.
  • Use the equation to calculate the optimal amounts in the generated quantities.

I understand that a lot of simplifications and assumptions go into this approach. I do think it is probably the simplest approach to get a reasonable first attempt at a challenging problem.

This is a classic problem in experimental design, and there’s lots of older literature on the topic (I like Box, Hunter, and Hunter’s “Statistics for Experimenters”) although not much that translates to more modern tooling.

The main benefit of discretizing the recipe “configuration” variables is that the resulting regression can capture non-linear dependencies of the output with those variables. On the other hand this discretization abandons any sense of “locality” between the variables values which means that one can no longer use local trends to identify a maximizing configuration efficiently like one can with a continuous quadratic model. Without making assumptions about these trends all one can do is scan over all of the possible configurations and see which yields the highest output.

Bayesian analyses, however, also introduce the additional complication that each reconstructed configuration behavior is captured by a distribution. Moreover comparing distributions is complicated by the fact that the distributions are correlated whenever two configurations share any variable levels (for example the same flour level). Comparing any two configurations is straightforward – we just look at the marginal posterior for mu_tastieness[config_one] - mu_tastiness[config_two] and see where it concentrates relative to zero. Comparing more than two configurations at the same time is less fun. By the way when reconstructing mu_tasteiness you don’t want to use “posterior_predict” which also convolves in the measurement variability.

A common approach to this problem is to rank the configurations based on their performance for each posterior sample and then find some way to choose a winner based on the distribution of these ranks. Ideally one configuration will always be the winner, but more realistically the winning configuration will vary from rank to rank and we have to come up with some criteria like “configuration most likely to be ranked best” or “configuration with the most posterior probability to be ranked in the top three”.

5 Likes

We might be crazy but our answer is, write the original model in a modular way, then write a second version (reusing the bits) that takes all of the parameters of the first model as data, and increments target with your tastiness, and then run Stan again as an optimizer (L-BFGS or Variational) over those design/control variables!

1 Like