Implementing R2D2 prior in a mutlivariate betabinomial regression

Thank you again for the feedback @jroon. This is a case of, for the sake of brevity, you only seeing the 10% of the iceberg above the surface (in this case that 10% is the post you replied to). Professor Vehtari has been giving me advice on the modelling process for almost a year and has explained to me the implications of the different interaction syntaxes for s(). What started as a gaussian model became binomial, then betabinomial, then went from three simple regressions to a multivariate model. I’m not actually convinced multivariate is conceptually a great idea for multiple drug classes. It’s quite rare for people in opioid treatment (our sample) to be using all three of these drugs at the same levels at the same time: people tend to use one or two at a time and not the others. The observations with the bad pareto-k levels tend to be these cases, where they might have, for example, used heroin and cannabis on 28 out of the previous 28 days but not used alcohol at all.

As to your point about clinical knowledge: I am not really a statistician, in the way Professor Vehtari is (few are). I am an addiction researcher who does a lot of stats so I have a lot of experience in clinical addiction research and work closely with clinicians. One of the reasons I am conducting this analysis is that there is not a lot of evidence from naturalistic data like this, about long-term trajectories of primary and secondary substance use among people being treated for opioid dependence. So while clinicians might have hunches about what happens there is little empirical data. But all are in agreement that, like all human behaviour, these relationships are never linear. That is why I am persisting with this very challenging project. What gets published will form justification for requesting a much larger data extract from our Ministry of Health, an extract large enough to make the inferences that can be drawn from the data much more certain. I agree with you that the coefficients from spline models are very hard to interpret, but I have scripts that make this possible, for example estimating the difference at various timepoints between people who used 0 days amphetamine at start of treatment and people who used 14 days (see here)