Techniques for setting priors to resolve prior-data conflicts

I am conducting a longitudinal analysis of substance use data from public health records, estimating the effect over time of amphetamine use at the start of treatment on trajectory of substance use over the course of a year. I’ve been seeking a lot of advice on this model over the last year (see here).

Using brms and k-fold cross validation (via the loo package) I have gotten to what I hope is the end of a long model-building process, starting with a linear model and adding polynomial terms until no additional explanatory power is added to the model (assessed using k-fold cross validation via the loo package. I have landed on a zero-inflated beta binomial model, with random slopes and linear, quadratic, and cubic polynomials for the time predictor (named yearsFromStart). All of the initial diagnostics - Rhat and ESS - are ok (although I had to increase the number of samples to 15,000 per chain with four chains to get the ESS for the estimate of the correlation between intercepts and slopes above 1000). The problem is that there are prior-data conflicts - on any parameter associated with the yearsFromStart variable (i.e. intercepts, interactions and sd).

Here is the regression output. outcome is the outcome, in this case it is a positive integer indicating days of amphetamine use in the 28 days prior to the measurement. ats-baseline is a time-invariant predictor indicating how many days in the 28 days prior to starting treatment the participant used amphetamines. yearsFromStart is a marker of when each measurement of the outcome was made (a positive number between 0 and 1 indicating at what proportion of a year the measurement was made). There are intercept and interaction terms for the linear, quadratic, and cubic versions of the yearsFromStart variable.

 Family: zero_inflated_beta_binomial 
  Links: mu = logit; phi = identity; zi = identity 
Formula: outcome | trials(set) ~ ats_baseline + yearsFromStart + I(yearsFromStart^2) + (yearsFromStart | encID) + I(yearsFromStart^3) + ats_baseline:yearsFromStart + ats_baseline:I(yearsFromStart^2) + ats_baseline:I(yearsFromStart^3) 
   Data: workingDF (Number of observations: 2301) 
  Draws: 4 chains, each with iter = 16000; warmup = 1000; thin = 1;
         total post-warmup draws = 60000

Multilevel Hyperparameters:
~encID (Number of levels: 687) 
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                     0.70      0.10     0.51     0.89 1.00    44170    45240
sd(yearsFromStart)                6.09      0.72     4.76     7.59 1.00     9443    22144
cor(Intercept,yearsFromStart)     0.93      0.07     0.76     1.00 1.00     1883     4005

Regression Coefficients:
                               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                         -4.20      0.12    -4.45    -3.96 1.00    50714    52478
ats_baseline                       0.32      0.02     0.29     0.35 1.00    54765    49987
yearsFromStart                    -1.18      1.16    -3.48     1.04 1.00    29799    41555
IyearsFromStartE2                 -6.94      2.95   -12.69    -1.13 1.00    41672    42526
IyearsFromStartE3                  2.07      2.32    -2.51     6.59 1.00    43884    45927
ats_baseline:yearsFromStart       -1.36      0.18    -1.72    -0.99 1.00    31923    41038
ats_baseline:IyearsFromStartE2     3.17      0.56     2.10     4.29 1.00    33100    37675
ats_baseline:IyearsFromStartE3    -1.84      0.45    -2.76    -0.98 1.00    33419    40734

Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi     5.70      0.65     4.53     7.08 1.00    22066    40203
zi      0.04      0.02     0.01     0.08 1.00    31759    28575

When I run the powerscale_sensitivity() function from the priorsense package I get the following

And these are the priors (from running prior_summary in brms)

                prior     class                           coef group resp dpar nlpar lb ub       source
         normal(0, 5)         b                                                                    user
         normal(0, 5)         b                   ats_baseline                             (vectorized)
         normal(0, 5)         b ats_baseline:IyearsFromStartE2                             (vectorized)
         normal(0, 5)         b ats_baseline:IyearsFromStartE3                             (vectorized)
         normal(0, 5)         b    ats_baseline:yearsFromStart                             (vectorized)
         normal(0, 5)         b              IyearsFromStartE2                             (vectorized)
         normal(0, 5)         b              IyearsFromStartE3                             (vectorized)
         normal(0, 5)         b                 yearsFromStart                             (vectorized)
         normal(0, 5) Intercept                                                                    user
 lkj_corr_cholesky(1)         L                                                                 default
 lkj_corr_cholesky(1)         L                                encID                       (vectorized)
    gamma(0.01, 0.01)       phi                                                       0         default
         cauchy(0, 5)        sd                                                       0            user
         cauchy(0, 5)        sd                                encID                  0    (vectorized)
         cauchy(0, 5)        sd                      Intercept encID                  0    (vectorized)
         cauchy(0, 5)        sd                 yearsFromStart encID                  0    (vectorized)
           beta(1, 1)        zi                                                       0  1      default

Can anyone point me in a direction for maybe choosing better priors that might help resolve the prior-data conflict, or give any other advice? I used the sort of one-size-fits-all priors that you can set in brms (e.g. normal(0,5) for all bs, cauchy(0,5) for all sds), so maybe I need to get more specific for the yearsFromStart parameter and its related polynomials? I’m not sure where to start. These models take ages to run so I’d prefer to have at least a general approach that stands a better chance of working.

Thank you.

p.s. The same model without the cubic polynomial terms uses the same priors and has no prior-data conflicts.

I think we should change the diagnostic message to “potential prior-data conflict” as 1) threshold for the diagnostic is arbitrary, 2) in case of more complex models the default approach of power-scaling all priors jointly can lead to higher apparent sensitivity. We are also in progress of adding functionality to brms to allow selecting which priors are scaled.

In your case, 1) some of the prior sensitivity values are just on the border of the ad hoc threshold while likelihood sensitivity is higher, and 2) linear, quadratic, cubic, and interaction coefficients are all correlated which can make a substantial difference whether powerscaling priors jointly or separately. As the different terms are all contributing to a common function, it would be better to look at the sensitivity in predictions as we did for GP in the priorsense paper.

This is an interesting example, and if you can share the data (even privately) we could investigate more with @n-kall, how to recognize when the model is including additive terms with high posterior correlation and to suggest focusing on more interpretable quantities.

Have you considered using splines or GPs instead? Cubic is rarely well justified and can produce strange results near edges and when extrapolating. Furthermore with splines and GPs you could check the sensitivity on the magnitude and lengthscale plus the predictions

Thanks @avehtari that would be great. Can we take this offline for now? How can I email you directly? I tried replying to a much earlier one but got a message saying the was a 90-day expiry on replying direct messages.