How to decide on whether to use a Multivariate Gaussian Mixture model?


I am new to stan and bayesian analysis and looking at modelling some spectroscopic data (UV-Vis), do you think for this kind of ‘signal’ data with discrete counts over a range of energies and a number of peaks (some which can be indistinguishable), a gaussian multivariate mixture model is appropriate?

Are there any resources dedicated to getting started with MV-GMM’s in Stan in any case?

Thanks for any assistance and resources.

What is the outcome(s)? What are the predictors?

I want to be able to get an estimate of parameters, and possibly be able to identify the number of peaks within the data (this is often indistinguishable), so that the absorbance can be identified. The parameters include the FWHM, amplitudes and any peak positions. Eventually this will also include a ‘background’ estimate for example.

Let me rephrase this since I’m a bit slow :)

If you want to predict y, what will you use to predict y, i.e., what are x and z here,

y ~ x + z

Please explain what types the variables are, e.g., real positive number.

it is my fault sorry I misunderstood your question. Y i.e. the excitation here can be modelled as gaussian function to account for the peak shape as a function of the wavelength(x). so e(v) ~ A*exp(-(x-xo)/fwhm)^2).
where xo= centre/peak position which is real and positive. x=wavelength values, again real and positive fwhm=peak width, real and positive and Amplitude is also real and positive.

Aah, ok, so it’s a nonlinear formula you need? If you want to try it out a bit before coding all in Stan then perhaps run it with brms first?


m <- brm( e ~ A*exp(-(x-xo)/fwhm)^2), nl = TRUE,
    family = normal(),
    data = d,
    prior = youPriors

Make sure to do prior predictive checks and simulate some data first to see that you recover the parameters properly.

I would try a more straightforward approach first. Have you seen any GMM examples in Stan?

I have tried fitting this in pystan using a general non-linear approach and although the fit is visually good with convergence. The parameter values have high levels of deviation. So I was looking at whether another model could better describe the data. But I think I may possibly be out of my depth here