About the peaks: I think the student t will do a reasonable way to fit those, but I might be wrong. I would also recommend doing the comparison with histograms instead of density plots (which are derived from the histograms). This should give you a better view on the similarity of y and y_hat.
If the student-t does not capture all aspects of the data that are important to you, you could also try the skewed-t (here is a thread about how to implement it: Skewed distribution).