Hey Everyone,
I am fitting a mixed effects model (random intercept account & slope for test occasion) on correct problems of addition tests gathered 3 times remotely from children. Because they’re remotely gathered and from six year olds there is quite a bit of noise (e.g. T1 = 22, T2 = 27, T3 = 2).
I wanted to start very simple with a random intercept only model, with a fixed effect of time and wide priors (I’m aware my priors aren’t that sensible right now, e.g. time should have a positive effect and a smaller sd).
{AdditionScore}_{i} \sim {Normal}(\mu_{i}, \sigma)
\mu_{i} = \alpha + \alpha_{SUBJ[i]} + \beta_{T}{Time}_{i}
\alpha_{SUBJ} \sim {Normal}(0, \sigma_{SUBJ})
\alpha \sim {Normal}(0, 10)
\beta_{T} \sim {Normal}(0, 10)
\sigma_{SUBJ} \sim {HalfCauchy}(0, 1)
\sigma \sim {HalfCauchy}(0, 1)
And following brms code:
time_randinci_250 <-
brm(data = addition_250, family = gaussian,
value.c ~ 1 + time.c + (1 | account_id),
prior = c(prior(normal(0, 10), class = Intercept),
prior(normal(0, 10), class = b),
prior(cauchy(0, 2), class = sd),
prior(cauchy(0, 2), class = sigma)),
sample_prior = "only",
iter = 5000, warmup = 2000, chains = 4, cores = 2,
seed = 13)
To speed it up I am only using 250 subjects.
All of my MCMC diagnostics look good, all of my model diagnostics seem okay. Yet when I try to do the waic I get the warning saying try loo instead, than when I use loo I get a warning of influential cases k > 0.7. I read somewhere it could be a sign of a misspecified model, I am using normal distribution and wide priors yet I think its a data issue.
This effects ~2% of the data, when I take subjects with observations with high Pareto K and plot the time course its clear that its noise. When I compared them to subjects with low K vals there’s usually one test that is clearly missed (which corresponds with the High K val).
My question is, is it kosher to exclude observations (make them missing) based on Pareto K values? If its not can I just delete the entire subject list-wise? How would this effect my model building, eventually I want to add random slope, covariates (e.g. grade etc…) and my treatment plan?
Last bonus question: Is it okay to compare two models with reloo = T where the observations excluded are different?
Thanks a million for your help,
Nick