Finding the minimum RSS with custom function

Dear all,
maybe this is a silly question but I would like to hear your thoughts about a problem I am facing.
I have a series of measures and a function (composed of 4 sub-functions, with a total of 9 parameters) that models the phenomenon using 2 covariates. The final formula of the model is complicated.
I want to code the function in Stan and use HMC to explore the space of the 9 variables to find the combinations of values that minimizes the Residuals Sum of Square.
How can I write down a formulation of the loglikelihood of the RSS?
Are there an example or tutorial to start with or maybe I should prefer some black-block optimizers?

Thanks in advance.

I think the crux of the Bayesian approach is modelling the joint distribution of your data treated as random variables and all your parameters . The probabilistic form allows a joint distribution to be factored, a rule emerges for updating parameters based on data, and MCMC/MAP can exploit this rule to update and quantify parameter uncertainty given your priors.

In your case, if your model cannot be factored into a set of PDFs then you’re likely better off not attempting to coerce HMC to think of it as such. Identifiability will be a problem and your results are likely to be poor.

Since your stated goal is a global optimisation rather than a probability model you may be better served with one of the global optimisation algos from scipy for example.

A potential Bayesian option is to use “Bayesian Optimisation” which attempts to model the distributions of your data and parameters (black box) as something like a Gaussian process. You may be able to code something like this in Stan but there is already packages available to do it (BoTorch).

1 Like

Thanks @emiruz!

1 Like

You’re not forced to use sampling approaches with a stan model, you can also optimize, which is what you want.

1 Like

Well you can MAP (mentioned) with Stan, which requires a probabilistic model. Further, Stan uses BFGS under the hood of the optimise function. BFGS and L-BFGS are quasi-Newton local optimisation methods and offer no guarantees regarding global optima to my knowledge.

it doesn’t require probabilities, just a target to maximize. within rstan at least it’s not too hard to use other optimizers, I have a package on github, stanoptimis, that uses various combinations of differential evolution, bfgs, stochastic gradient descent, and importance sampling. It needs further work and documentation, but yeah, it’s there…

As in just update the target yourself and use Stan to set parameter constraints to run L-BFGS? (Unless you’re going to code your own extensions to add other optimisers)

I personally don’t understand what value that presents over running the optimisation directly on top of whatever the best execution engine is for your optimisation problem? Stan’s probability toolbox and purpose built DSL is what makes it interesting
and this use case doesn’t use any of it to my mind.

Would you mind adding some colour RE motivation?

Yeah, if your problem can already be easily specified and optimized through some other existing approach, I agree there’s little if anything to gain. When that is not the case, stan offers a fairly accessible approach to specify complex models and then efficiently compute the target and gradient via compiled c++ .