Modeling with W-shaped (or L-shaped) dependent variable

Hi all,

I use Stan with brms in R.

My data is attached here:
sample_03_21_230828.csv (109.9 KB)

In the dataset, the distribution of the dependent variable has many zeroes and some negative values.

sample_hist

Its distribution is as follows.

      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.       NA's 
-3.212e+09  0.000e+00  7.990e+06  1.694e+08  7.921e+07  1.698e+10        787 

In the literature, most studies use logged values for this kind of variable, but due to negative values and zeroes, this is a challenge.

Some studies use the following formula to get logarithmic of absolute values and multiplying by -1. Something along the lines of the following code:

df %<>% mutate(
log_abs_value = 
case_when(
value > 0 ~ log((value)/ 1e4),
                                        
value == 0 ~ log(1),
                                    
value < 0 ~ -log(abs((value)/ 1e4))))

This produces the following histogram of the dependent variable’s distribution.

I have a hierarchical model with a multivariate regression using brms.

In both cases, the dependent variable’s distribution does not seem to warrant a regression with Gaussian family. What would you suggest? I would particularly appreciate it if you could recommend example code and articles/ books/ blogs with more explanation.

Thank you in advance.