Evaluating influence of parameters on model predictor or response?


#1

I want to decide about whether a parameter of my linear Bayesian regression model (log-normal family, fitted in R with brms) has a “significant” influence on the response, i.e. the data scale.

I am using Kruschke’s HDI+ROPE rule in R with the sjstats package by @strengejacke (equi_test function). The function tests the HDI+ROPE rule for each of the fitted posterior distributions of the parameters in the model, i.e. returns whether to accept/reject/leave undecided the null hypothesis. Now, since I am using a log-normal distribution, this is done on the log scale, i.e. it tests if a parameter has some influence on the predictor (mean of the log-normal distribution), if I understand correctly. Does this imply that this result about the parameter is also valid with respect to the model response, i.e. on the data scale, or is it necessary to predict those values first and then evaluate their distribution?


#2

Note that in case of correlating covariates, the marginals used for HDI+ROPE can be misleading. That is in case of collinear covariates it may look like none of the variable is relevant even if all are. See collinear and bodyfat examples in https://avehtari.github.io/modelselection/ (there’s also answer how to do variable selection in case of collinearity).

Highest posterior density interval (HDI) is not invariant to transformations, and thus HDI in the data scale can be different and you may get different result for ROPE. If the covariates are independent you could 1) use central interval instead of HDI, or 2) use posterior_linpred() to get to the data scale before computing HDI.


#3

Thank you for your suggestions.

I am not sure I understand this correctly. Can you clarify what you mean by independent?

Do you mean the equal-tailed credible interval? My understanding was that HDI is more “robust”, especially for skewed distributions.


#4

Just the usual covariate independence assumption, ie covariates are independent if their joint distribution p(x_1,x_2)=p(x_1)p(x_2). See collinear and bodyfat examples in https://avehtari.github.io/modelselection/ for cases with dependent covariates, ie collinear or correlating covariates and what is the effect that has to the joint posterior of coefficients.

Yes.

I think robust is the wrong word here (we could even say that computation of HDI is more fragile). HDI may be better summary for skewed distributions, but as it is not invariant to the transformations you need to compute it in the specific scale you are interested in. In this case you need to think what summary is describing in the best way the information in the distribution. I think problem with HDI+ROPE, is that it is hiding some of the information and the focus is too much on “significance testing”, and even if you use it, I recommend to look at the whole posterior, too.