I have some data that looks like this:

PC1,PC2 are principal components (they account for >.95 of the variance). I’d like to try to model a potential data generating process for it.

I’m presently looking at modelling it as a mixture of heteroscedastic linear regressions because of all sharp lines and gradients with gamma marginals for PC1. I ran some pilots and I can fit a K=5 mixture to it with Stan although I can’t make it identifiable yet.

My approach feels a bit arbitrary so I wondered what your first thoughts would be regarding how to approach modelling data that looks a bit like this?