Hi all,
I use Stan with brms in R.
My data is attached here:
sample_03_21_230828.csv (109.9 KB)
In the dataset, the distribution of the dependent variable has many zeroes and some negative values.
Its distribution is as follows.
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-3.212e+09 0.000e+00 7.990e+06 1.694e+08 7.921e+07 1.698e+10 787
In the literature, most studies use logged values for this kind of variable, but due to negative values and zeroes, this is a challenge.
Some studies use the following formula to get logarithmic of absolute values and multiplying by -1. Something along the lines of the following code:
df %<>% mutate(
log_abs_value =
case_when(
value > 0 ~ log((value)/ 1e4),
value == 0 ~ log(1),
value < 0 ~ -log(abs((value)/ 1e4))))
This produces the following histogram of the dependent variable’s distribution.
I have a hierarchical model with a multivariate regression using brms.
In both cases, the dependent variable’s distribution does not seem to warrant a regression with Gaussian family. What would you suggest? I would particularly appreciate it if you could recommend example code and articles/ books/ blogs with more explanation.
Thank you in advance.