You were right, if I introduce a second peak, sometimes the model works and find both and sometimes not.
Here is the reason for the discrete grid:
What I observe are noisy projections of a unknown 3D object. The goal is to reconstruct the 3D volume from the 2D projections. In the beginning you start with some kind of initial of low resolution inital 3D model, then you find the likelihood of the orientations + shift (on a coarse grid) of the observed projections, backproject them (weighted by likelihood) and get a better resolved 3D reconstruction. Then you restart the process (expectation maximization) with the new reconstructed volume and a finer grid. In the end you get a reconstruction which has a high resolution.
There are already software packes who implement bayesian appraoches for this problem ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3690530/ ), but I entered the world of bayes with the rethinking book and STAN and was wondering if I can implement similar things with STAN.
This sounds like something that would be extremely challenging to implement in Stan. Maybe not impossible, but definitely challenging. As I said - treating the grid indices as discrete parameters to be marginalized out might work and get rid of the multimodality, but there might be other challenges and it would likely be very expensive computationally.
This is the funny thing with the word “Bayes” it can mean many different things. The paper you linked doesn’t (after a brief reading through it) describe a “Bayesian” approach in the sense Stan uses the word. What they do is they compute a MAP estimate which is besically maximum likelihood + regularization. (in Stan you can achieve this behaviour using the “optimize” mode). They do not (if I understand the paper correctly) try to quantify the uncertainty about the structure, which would be needed to let us call the approach “fully Bayesian”. IMHO their use of “Bayes” is really just a buzzword, MAP estimates are AFAIK rarely called “Bayes” (because hey - we already have a way to call them and that is “MAP estimate”).
BTW I don’t really understand Cryo-EM, but I believe it is very cool stuff! However, if I read the paper right, even optimization-based approaches (like the MAP procedure they describe) are very expensive to compute, making it unlikely that fully Bayesian treatment would be tractable.