Posterior predictive plot shows that model is not capturing data well

xenia · March 18, 2021, 9:45am

I am modelling some peaks and currently have a model but the issue is the model seems to not capture the data well as shown by the posterior predictive plot shown, it is really tough because some of my peak positions are very close together which means there can be little to no variance in these priors or the sampling can’t converge… ppc

edited to remove code as it is slightly hard to read, easier to give an overview as below:

currently, I have been modelling these using a sum function as shown below:

y ~ ( m*L(x[i]) + (1-m)*G(x[i])) where L(x) is lorentz and g(x) is gauss functions and ‘m’ is the mixing ratio) and I just do an additive procedure for each peak to form the function and then model this with a normal likelihood given the large number of counts. This is modelling count data (the number of electrons detected) , would this maybe point to needing to use a different approach? Does anyone have any experience with this sort of mixture function and how it works in stan because I do not think my current approach is working …

mike-lawrence · March 18, 2021, 5:51pm

Can you post your model code? If you put the code between two lines, each with three backtick characters (```), it’ll automatically format nicely. And if you put the word “stan” (without quotes) on the first line of three backticks, it’ll show nice colours too.

xenia · March 19, 2021, 5:36pm

Also see here the current ppc/ same model , slightly different data… ppccwtf

OriolAbril · March 19, 2021, 8:29pm

It looks like the data is constrained to be >0 but the posterior predictive values are not. Are the values bounded or is there any kind of censoring/truncation in the data collection process?

Note: I am saying it looks like because I think the plot was generate with ArviZ, whose KDE is defined only between the min and max values in the samples provided, the KDE is not extended outside this range. As the KDE lines do extend to negative values the min values should be lower than 0. I am not completely sure the observed values do not extend to negative values just looking at the plot, but I am quite certain.

xenia · March 19, 2021, 11:01pm

Hi, yes the data is constrained to >0, so has no negative values as since it represents electron counts

Edit: I think possibly the issue is that my data is constrained to >0 but my likelihood doesn’t seem to be… is there a way I can constrain this to >0, the issue is my data has many non-integer values so if I was to use a poisson likelihood it will be expecting integer values but my data is not in this form. I have been using a normal likelihood given the large number of counts, but actually I need a way to constrain the likelihood to >0 so it captures the data better, any ideas how I do this? This is what I think might be the issue but I am not sure… as you mention the pp values are not constrained in this way but they need to be…

OriolAbril · March 19, 2021, 11:42pm

You’ll definitely want to take that into account when defining the model then. You should use <lower=0> when defining y in the data block. What I am not sure about is if you should also use a truncated distribution of not in the model section (docs on truncated distributions Stan Reference Manual).

xenia · March 20, 2021, 12:13am

Thank you, I will apply those bounds to y in that case. The truncated normal seems to make sense to me, could I apply this with a lower bound of 0 so T[0, ]

Another thing is, do you know why the height of my observed y is so high compared to that of my predictive y? Is this again to do with my bounds?

OriolAbril · March 20, 2021, 12:42am

I think the bounds will fix that, intuitively what should happen is that all the probability that now posterior predictive have on negative values will be “redistributed” around 0 and small y values, making the KDE around 0 higher

OriolAbril · March 20, 2021, 12:46am

I should note that I have no idea if random value generation with normal_rng will take the bound into account, I guess you’ll see once you implement it. In case it did not work, posterior predictive values can be generated using scipy+inferencedata+xarray, I can help with that if were to be necessary.

xenia · March 20, 2021, 12:53am

Thank you so much for the helpful responses, I will see whether it does, and let you know.

Topic		Replies	Views
Posterior Predictive Check Modeling fitting-issues	2	535	November 24, 2022
Posterior predictive checking Modeling	10	2021	November 26, 2019
Using average of the posterior draws for posterior predictive checking? Modeling	2	676	November 29, 2022
Useful PPC visual checks for counts of diffent magnitudes? General bayesplot	3	57	August 14, 2024
Toy ODE model problem -- posterior predictives not consistent with real values Modeling fitting-issues , specification	14	1047	June 4, 2019

Posterior predictive plot shows that model is not capturing data well

Related topics