Fitting when data is a density

So this is more like a curve fitting than a mixture model, but conveniently it’ll have the mixture model problems I guess haha.

So you have this process that generates a curve. Presumably we’re assuming this is deterministic?

Like the ideal version of the thing you plotted is

f(\vec{\alpha}, \vec{\text{loc}}, \vec{\text{scale}}) = \sum_i^N \alpha_i e^{-(x - \text{loc}_i)^2 / \text{scale}_i}

and for whatever reason this is a sum of squared exponential-looking terms. All you need to do is decide on a measurement process and you’re ready for a Stan model.

For continuous, the standard thing is to just fit a normal if you don’t have a reason to think anything else.

So the simplest curve fit you might try is:

parameters {
  vector[5] alpha;
  vector[5] loc;
  vector<lower=0.0>[5] scale;
  real<lower=0.0> sigma;
}
model {
  y ~ normal(f(alpha, loc, scale), sigma);
}

So it’s easy enough to write that down, but you’re probably going to have a bunch of problems with this model. I’ll list em’ in order of what I think is most importance, but they’re all pretty rough.

  1. The model I wrote down has 16 parameters. I do not think the curve you plotted will identify 16 parameters reliably. Just my opinion – I didn’t actually try it, but that curve looks really simple. There are usually a deceptively large number of ways to put quadratic exponentials together and fit curves. Sampling will help you find some of them, but probably not all of them, and the diagnostics will be sad.

  2. You say there is strong evidence to support that this data is the sum of a bunch of squared exponentials. I believe that maybe it is a good fit, but your data looks really clean. When your data doesn’t have much noise, it is important that you have the right model. Bayesian posteriors are only reliable when you have a good model, but if you have really clean data it raises the bar on what a good enough model is. If you don’t have a good model then the posterior is going to tell you more about model mispecification than anything reliable about the parameters your system. It’s all about that generative process.

  3. There’s lots of problems with the posteriors of things that are mixed together. For instance, re-orderings of the means. You can put ordering constraints on the means but it’ll still be rough.

Since I answered so negatively, here’s a similar question from awhile ago: Mixture of Gaussian functions . You can have a look at that and see if there’s anything useful.

I probably made the outlook sound really bleak, haha :P, but feel free to ask more questions. I could be super wrong. Just give things a try and see what works and what doesn’t, and post questions if you have em’! Hope that helps at all!