Used Packages
Brms 2.14.8 loo 2.4.1
I want to compare two ordinal probit models (cumulative model with probit link). I encountered errors for the more flexible one and unfortunately wasn’t able to solve them by myself, maybe some here having some thoughts. All the high pareto k’ seems to be values that could be seen as “outliers” or specifically Responses that vary from the average Responses. Assume Participant x - he always(!) responds with 6 when he sees a previous learned word and with 1 when he sees a new word (previous not learned) and if this Participant says to just one new word “2” then this appears to be a high pareto k. Due to having a memory experiment, I would definetly expect that some of the values are more variable.
We have N = 24 Participants, doing the experiment 4 times ( depending on time and condition)
The first model is a more restriced one. Assuming the latent variable havinga variance of 1 in all groups (disc = 1)
First Model Brms Syntax
evsdt <- brm(
bf(Response ~ 1 + item*condition*time + (1 + item * condition * time | ID_T1T2 )),
data = data_hypnomemory, family = cumulative("probit"),
iter = 4000, inits = 0, save_pars = save_pars(all= TRUE),
control = list(adapt_delta = 0.95), file = "evsdt_file"
)
Using
evsdt_loo <- loo(evsdt, moment_match = TRUE)
Computed from 8000 by 3840 log-likelihood matrix
Estimate SE
elpd_loo -2634.0 70.7
p_loo 127.9 6.6
looic 5268.0 141.4
------
Monte Carlo SE of elpd_loo is 0.2.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 3825 99.6% 1259
(0.5, 0.7] (ok) 15 0.4% 550
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details
The second one allows the variance of the latent variable to vary between groups.
2. Model Brms Syntax
uvsdt <- brm(
bf(Response ~ 1 + item * condition * time + (1 + item * condition * time | ID_T1T2 ),
disc ~ 0 + old + old:condition2 + old:time2 + old:condition2:time2 + (0 + old + old:condition2 + old:time2 + old:condition2:time2 | ID_T1T2 )),
data = data_hypnomemory, family = cumulative("probit"),
iter = 4000, inits = 0, save_pars = save_pars(all = TRUE),
control = list(adapt_delta = 0.99, max_treedepth = 15),cores =6,
file = 'uvsdt_file'
)
Using
uvsdt_loo <- loo(uvsdt,cores = 4)
We get the following Result:
Found 12 observations with a pareto_k > 0.7 in model 'uvsdt'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.
Computed from 8000 by 3840 log-likelihood matrix
Estimate SE
elpd_loo -2529.7 68.2
p_loo 157.8 8.3
looic 5059.4 136.5
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 3793 98.8% 646
(0.5, 0.7] (ok) 35 0.9% 328
(0.7, 1] (bad) 12 0.3% 63
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
Using
uvsdt_loo <- loo(uvsdt,moment_match = TRUE)
I get the following Error
Error in validate_ll(log_ratios) : All input values must be finite. Error: Moment matching failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model?
I tried to use kfold with the model and then the following Error appeared:
[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”
[1] “error occurred during calling the sampler; sampling not done”
Start sampling
Error: The model does not contain posterior samples.
Maybe I can use the given information to decide which of the two models is better?
I have an hierarchical model so I dont know how many parameters I do have?
(Intercept[1;2;3;4;5]+ item+ condition + time + item:condition + condition:time + item:condition:time + disc_item + disc_item:condition + disc_item:time + disc_item:condition:time)*25? Because N = 24 and the population level effects?
Hopefully someone takes the time to read this and replies.
With kind regards,
Dominic :)