I am modelling some peaks and currently have a model but the issue is the model seems to not capture the data well as shown by the posterior predictive plot shown, it is really tough because some of my peak positions are very close together which means there can be little to no variance in these priors or the sampling can’t converge…

edited to remove code as it is slightly hard to read, easier to give an overview as below:

currently, I have been modelling these using a sum function as shown below:

y ~ ( m*L(x[i]) + (1-m)*G(x[i])) where L(x) is lorentz and g(x) is gauss functions and ‘m’ is the mixing ratio) and I just do an additive procedure for each peak to form the function and then model this with a normal likelihood given the large number of counts. This is modelling count data (the number of electrons detected) , would this maybe point to needing to use a different approach? Does anyone have any experience with this sort of mixture function and how it works in stan because I do not think my current approach is working …

Can you post your model code? If you put the code between two lines, each with three backtick characters (```), it’ll automatically format nicely. And if you put the word “stan” (without quotes) on the first line of three backticks, it’ll show nice colours too.

It looks like the data is constrained to be >0 but the posterior predictive values are not. Are the values bounded or is there any kind of censoring/truncation in the data collection process?

Note: I am saying it looks like because I think the plot was generate with ArviZ, whose KDE is defined only between the min and max values in the samples provided, the KDE is not extended outside this range. As the KDE lines do extend to negative values the min values should be lower than 0. I am not completely sure the observed values do not extend to negative values just looking at the plot, but I am quite certain.

Hi, yes the data is constrained to >0, so has no negative values as since it represents electron counts

Edit: I think possibly the issue is that my data is constrained to >0 but my likelihood doesn’t seem to be… is there a way I can constrain this to >0, the issue is my data has many non-integer values so if I was to use a poisson likelihood it will be expecting integer values but my data is not in this form. I have been using a normal likelihood given the large number of counts, but actually I need a way to constrain the likelihood to >0 so it captures the data better, any ideas how I do this? This is what I think might be the issue but I am not sure… as you mention the pp values are not constrained in this way but they need to be…

You’ll definitely want to take that into account when defining the model then. You should use <lower=0> when defining y in the data block. What I am not sure about is if you should also use a truncated distribution of not in the model section (docs on truncated distributions Stan Reference Manual).

Thank you, I will apply those bounds to y in that case. The truncated normal seems to make sense to me, could I apply this with a lower bound of 0 so T[0, ]

Another thing is, do you know why the height of my observed y is so high compared to that of my predictive y? Is this again to do with my bounds?

I think the bounds will fix that, intuitively what should happen is that all the probability that now posterior predictive have on negative values will be “redistributed” around 0 and small y values, making the KDE around 0 higher

I should note that I have no idea if random value generation with normal_rng will take the bound into account, I guess you’ll see once you implement it. In case it did not work, posterior predictive values can be generated using scipy+inferencedata+xarray, I can help with that if were to be necessary.