Recently I fit a log-normal multi-level regression with brms. Now I’d like to use the coefficients obtained from this model to perform some optimization, essentially trying to figure out how to get the maximum response for a finite number of counts on each of the variables in the model.
I’ve played around with using scipy.optimize.minimize() and another package in Python called Gekko (which is pretty fast in fairness), but for the sake of my development I also thought it would be useful to be able to do this with stan.
I saw this blog post with a very simple example, which I’ve expanded on. I think defining the bounds on the variables is relatively straightforward in stan, but how should I go about incorporating my equality/inequality constraints. I’ve had a go in my example, adding to target only in cases where the total activities to an individual is under 10, but this seems like a hack.
I also wanted to ask how I should go about extracting the optimum solution from the fitted object, as taking the mean of the posterior (as the author of the blog post did) would not produce the right result given a fit that produced many local maxima. Would taking the 5% HPD be a good way to do this?
If anyone has any good materials, blogs, books on using stan for these kinds of purposes, I’d be really grateful if you could share them. Thanks.
reg_code = """
data{
int<lower=0> N; // number of ids
vector[N] beta1; // coefficients for each id
vector[N] beta2;
vector[N] beta3;
}
parameters{
vector<lower=0,upper=5>[N] x1; // putting bounds on variable seems straightforward
vector<lower=0,upper=5>[N] x2;
vector<lower=0,upper=5>[N] x3;
}
model{
for(i in 1:N){
if((x1[i] + x2[i] + x3[i]) < 10){ // inequality constraint that id total can't be more than 10
target += exp(3 + x1[i]*beta1[i] + x2[i]*beta2[i] + x3[i]*beta3[i]);
}
}
}
"""
Sorry for not getting to you earlier. I have no experience with using Stan for optimization, so I can’t comment on this part of the question. I am however a bit puzzled on why one would use sample to find the optimum instead of using the built in optimize functionality. Also note that the target in Stan is assumed to be the logarithm of the value to be optimized (for pure optimization of positive values this doesn’t change the meaning, but it is IMHO good to bear in mind).
I can nevertheless give a try on how to express the constraint.
Generally, adding to the target only if the constraint is met is IMHO unlikely to be sensible. Not adding anything means that for given i the log target is 0, so you might actually be putting some preference for values that break the constriant (depending on what the values of beta are).
I think that enforcing general inequalities in Stan is hard (see e.g. Multiple equality constraints on parameters). The problem is that it is hard to do without making the model not smooth (discontinuous derivatives). However, especially for pure optimization, it might be the case that some non-smoothness would be tolerated. If I am not mistaken, the constraints you have could be reformulated as:
functions {
vector pmin(real a, vector b) {
vector[num_elements(b)] res;
for(i in 1:num_elements(b)) {
res[i] = min({a, b[i]});
}
return res;
}
}
data{
int<lower=0> N; // number of ids
vector[N] beta1; // coefficients for each id
vector[N] beta2;
vector[N] beta3;
}
parameters {
vector<lower=0, upper=5>[N] x1;
vector<lower=0, upper=5>[N] x2;
vector<lower=0, upper=pmin(5, 10 - x1 - x2)>[N] x3;
}
model{
for(i in 1:N){
target += exp(3 + x1[i]*beta1[i] + x2[i]*beta2[i] + x3[i]*beta3[i]);
}
}
This requires a bit more recent Stan that supports vector of upper bounds (so you either should use cmdstanr or install rstan via Repository for distributing (some) stan-dev R packages | r-packages ). I would guess this would work if the optimum isn’t particularly close the the sharp corners introduced by the parallel minima (pmin). But please check this for yourself.
I don’t really have much experience with optimize. It looks like it gets the posterior mode. But what if you want the distribution of the parameters? Gotta sample still. Is there a vignette for it, or just the documentation?
I used Stan in some cases where quadratic programming problems can be converted into regression problems. It works so long as any inequality constraints are relatively simple.
My understanding from the blogpost the OP linked is that the goal was just to use Stan to find a maximum of a function, where the function does not even have to be a probability distribution, so it is unclear to me what “distribution of parameters” would even mean in this context. But maybe I am missing some context that is implied by the application domain?
Though he doesn’t provide the actual stan code (so I can’t say for sure if this makes sense with what he actually does), the second example in the blog post discusses the situation where there is some variability in the actual production process. Conceivably, a production manager might be interested in the distribution of their production output.
My initial thought though was how the first example in the blog post tries to maximize 2x+4y, where 2 and 4 are the prices of the products. Is it not reasonable to relax the assumption that these are fixed prices? Perhaps it can be recast as ax+by, where a~normal(2, s_a), b~normal(4, s_b) (for some standard deviations s_a, s_b). In this framework, it might be useful to know what the distribution of x and y are. For instance, if s_b is larger than s_a, then y might have a larger standard error than x.