I am fitting `brms`

models on rating data that I collected.

The ratings range from 0 to 200 and are highly skewed as can be seen in the posterior predictive plots below.

As the response variable distribution is highly skewed, my intuition was to fit a `skew_normal`

model to these data. However, as I haven’t used `skew_normal`

models a lot in the past, I also fit a `gaussian`

model to these data, just for comparison.

To simplify matters for this question I fit a model only estimating the overall intercept + random intercept adjustments per participant. The two models i used are:

```
m_gauss <- brm(m_conf ~ 1 + (1 | ppid)
, data = d_h1
, chains = 4, cores = 4)
```

and

```
m_SN <- brm(m_conf ~ 1 + (1|ppid)
, data = d_h1
, family = "skew_normal"
, chains = 4, cores = 4)
```

In this case I use default priors, but I tried to increase the `alpha`

prior of the skew-normal model in it does not make much of a difference.

As was to be expected the `pp_check`

overlay plot looks quite bad for the `gaussian`

model (left) and much more reasonable for the `skew_normal`

model (right).

However, what surprised me was that the `gaussian`

model (left) rather than the `skew_normal`

model (right) did a better job at recovering the observed mean and standard-deviation in the data.

What I also realized is that the gaussian model looks better in terms of `pareto-k`

diagnostic, i.e. shows less problematic/influential observations.

When however, comparing the models with both `loo-ic`

and `k-fold`

CV, the skew-normal model outperforms the gaussian model in both cases.

LOOIC | SE | |||
---|---|---|---|---|

m_gauss | 12273.47 | 104.38 | ||

m_SN | 11786.12 | 82.09 | ||

m_gauss- m_SN | 487.35 | 49.31 |

`elpd_kfold`

is -6141.9 for the `gaussian`

model and -5898.7 for the `skew_normal`

.

My question is what I should do with this.

As I see it, the `skew_normal`

model performs better in terms of prediction / fit-indices.

Why is it though, that the `skew_normal`

model identifies more problematic observations in the data in terms of `pareto-k`

and does not correcly recover the observed mean in the data?

Is that something with the data or is it something that can be expected with a `skew_normal`

model in general?

Is there maybe another model-family / transformation that I could try that might work better?

I hope I included all relevant information and this is the right place to ask a question like this.

Thanks in advance!

Julian

- Operating System: Windows
- brms Version: 2.7.0