Modelling with non-binary proportions as outcome

Thanks so much for the detailed reply. It’ll take me a while to wrap my head around the Koster and McElreath paper, but given how much I love his ‘rethinking’ I’m sure it’ll be worth it!

The post by Paul is also great. I’ve attached a small and anonymised subset (10) of the data - 157 timepoints per participant, 10 states.

>  A tibble: 1,570 x 12
>    sub_no   Age state_bin_01 state_bin_02 state_bin_03 state_bin_04 state_bin_05 state_bin_06 state_bin_07 state_bin_08 state_bin_09 state_bin_10
>    <chr>  <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
>  1 a       94.8            0            0            0            0            0            0            0            1            0            0
>  2 a       94.8            0            0            0            0            0            0            0            1            0            0
>  3 a       94.8            0            0            1            0            0            0            0            0            0            0
>  4 a       94.8            0            0            1            0            0            0            0            0            0            0
>  5 a       94.8            1            0            0            0            0            0            0            0            0            0
>  6 a       94.8            1            0            0            0            0            0            0            0            0            0
>  7 a       94.8            0            0            1            0            0            0            0            0            0            0
>  8 a       94.8            0            0            1            0            0            0            0            0            0            0
>  9 a       94.8            1            0            0            0            0            0            0            0            0            0
> 10 a       94.8            0            0            0            0            0            0            0            1            0            0
>  … with 1,560 more rows

Stealing from Paul’s post:

test_subset$y<-with(test_subset,cbind(state_bin_01,state_bin_02,state_bin_03,state_bin_04,state_bin_05,state_bin_06,state_bin_07,state_bin_08,state_bin_09,state_bin_10))

If I understand correctly, the idea would be to fit something like:

fit ← brm(bf(y | trials(1) ~ (1|ID|sub_no)), data = test_subset, family = multinomial(),save_all_pars = TRUE,cores = 4,chains = 4)

That was pretty slow to run (~1H15MIN) but the results make sense. I suspect the correlations between conditions - which as recommended in another post - makes it far slower?

Since I am interested in the effect of ‘fixed’ effects on the dwell times, the final model would look more like:

fit ← brm(bf(y | trials(1) ~ Var1+ Var2 + Var3 …(1|ID|sub_no)), data = actual_data, family = multinomial(),save_all_pars = TRUE,cores = 4,chains = 4)

I’ll let a model with ‘Age’ run overnight. Anything you would change with the model specifications above?

I suppose another option, more in keeping with Paul’s post, would be to use the participant level summary of the count data:

   sub_no state_bin_01 state_bin_02 state_bin_03 state_bin_04 state_bin_05 state_bin_06 state_bin_07 state_bin_08 state_bin_09 state_bin_10
   <chr>         <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
 1 a                43            0           17            0            0           38            0           55            0            4
 2 b               106            0            0            0           18            0           11           22            0            0
 3 c                 0          132            0            0            0            0           22            0            0            3
 4 d                57            0           27            9            0           26            2           12            6           18
 5 e                 0            0            0            0          154            0            0            3            0            0
 6 f               104            0            3            0           11            7            2           30            0            0
 7 g                20           15            0            0           17            0          102            1            0            2
 8 h               101            0            0            2           18            2           13           21            0            0
 9 i                51            0           50            0            0            0            0           54            0            2
10 j                 0            0            0            0          113            0           44            0            0            0

Which (guessing here) may run faster? Perhaps I need to understand the Koster paper better to know what I’d lose here.

Thanks again

Hugo

test_subset.csv (61.5 KB)

1 Like