Selecting model parametrization depending on input data

Hi all,

I am working on a research project that consists in analyzing proteomics data obtained by mass spectrometry, using Bayesian statistical models.
I have started to work with STAN about 6 weeks ago, and I am slowly but surely climbing the learning curve :-) Btw I am amazed by the concentration of knowledge and technology that is in there ! :-D

My high level question is the following : is there a way to recycle the same STAN model script for slightly different parametrizations of a model, which would be harmless in terms of performance ?

Let me clarify a bit. The structure of my problem boils down to solving together a large number of linear regressions, each linear sub-model being devoted to modelling the abundance of one single protein in a number of different conditions (depending on the experimental design). Typically this has to be done for more than a thousand proteins (k,1\ldots K, K>1000). Now I am testing different modelling set-ups like e.g. modelling some of the \beta_{ij} parameters hierarchically, modelling the \sigma^2 parameters hierarchically, using robust regression (student likelihood instead of normal likelihood) etc. These specification choices can typically be combined as well, e.g. use robust regression, while building a hierarchical structure around some of the \beta's, the \sigma^2's and even the degrees of freedom of the t-likehood etc.

My ultimate goal would be to build a method in R to analyze these mass spectrometry data, embedding stan as the inference engine. The choice of parametrization would be triggered by some parametrization flags input by the user. But I am puzzled about how to handle this kind of flexibility in the model section of the stan script. Do I necessarily need to have a different .stan file for each combination of model parametrization ? This could be become very cumbersome as the number of parametrization choices increase because of all the possible combinations.

Now I already tried to build a more generic script, using parametriation flags in the input data section, but I have the impression that it is not possible to define a parameter in the model section, conditionally on an input flag of the input data section. As a result, I had to create parameters that are useless in some parametrization choices (having no impact on the likelihood), and to define an arbitrary prior distribution for them, which is obviously not good in terms of simulation performance, as I am just adding useless dimensions to the problem.

Has anyone already faced this kind of problem ?

Thanks a lot !

Philippe

Edited by @Max_Mantei: 2 minor latex edits

Hi Philippe! Good to hear that you are enjoying Stan :)

Did you already try to use the zero-length-array-trick (not the official name) for conditional parameter declaration? It’s a bit older, but I think Martins post is still a good resource to read about that: https://www.martinmodrak.cz/2018/04/24/optional-parameters/data-in-stan/

Hope that this’ll work for you.

Cheers,
Max

1 Like

Hi Max,
Thanks a lot for your reply !
I’ve just had a look at your indicated post at Martin Modrak’s blog.
Seems to be exactly what I need, thanks ! :-)

1 Like