I’m modeling some data on children’s understanding and learning of an important pre-algebraic concept. When you give U.S. 7-to-11-year-olds an assessment with relevant items, most children will answer all of the items incorrectly, a decent minority will answer most of the items correctly, and the rest will be in the middle (example distribution below).

Historically, these data have been treated as either A) Normal in ANOVAs/t-tests (which naturally yields terrible posterior predictions) or B) categorical (e.g., completely incorrect vs. at least 1 correct, which can toss out a lot information depending the sample).

More recently, some have gone the binomial GLMM route (with random item intercepts/slopes). I’m finding such models consistently run into trouble with high Pareto-k values. For example, we might ask whether something like working memory capacity (WMC) predicts children’s performance on a set of items with the following model.

```
example_mod <- stan_glmer(correct ~ zWMC + (1|id) + (1|item), family = binomial, pretest_data)
plot(loo(example_mod))
```

Current thinking is that the binomial GLMM approach is a good way to handle these data, but LOO diagnostics pretty much always look something like the above. I’m guessing the distribution of the (summed) correct item responses is part of the problem? Any insights or recommendations for alternative approaches would be greatly appreciated!

BONUS: Experiments in this area often involve pretest and posttest measures after being randomized to different types of instruction. Due to the shape of these data, the most popular way to assess effectiveness of interventions and other predictors of learning is to analyze the posttest of individuals with no/little pretest knowledge (and ignore individuals with partial understanding). An approach that could incorporate these middle cases would very be useful.