# Modelling a complex correlated and longitudinal dataset - Random intercept? First-order Markov model?

TL;DR: I need help analyzing a complex dataset from experiments with mice. Given the correlation and longitudinal pattern, I considered Frank Harrell’s Bayesian transition models for ordinal longitudinal outcomes (DOI 10.1002/sim.10133), but I am unsure if they fit well in this case.

Background: Mice are kept in cages with other mice (a total of 4 in each cage).

The experiment involves separating a mouse or group of mice from its original cage by placing them in another cage and then measuring some outcomes. After each experiment, mice are put back together in their original cage with their original fellows.

There are three different “separation patterns” for new cages:

• 1 mouse alone
• A pair of mice
• All four mice

These separations occur multiple times across each mouse’s lifespan. Mice were organized into two cohorts, one of which was tested at 10, 20, and 30 days old (baseline measured at 8), while the other was tested at 30, 40, and 50 days old (baseline measure at 28).

These experiments happened at five different ambient temperatures.

In summary, for each of the ages mentioned above (5 options), we have data on 3 separation patterns by 5 different temperatures.

5 x 3 x 5 = 75 experiments

A specific mouse was not tested all 75 times, but it was tested multiple times (I’m not sure how many times exactly). Please note that the data is clustered by mouse (the same animal is tested multiple times) and by their original cage (mice that live together tend to behave alike).

Our primary outcome is a mouse-level measurement that is continuous and can only be greater or equal to 0.

Our research questions are:

At each age, what is the difference in outcome between the separation patterns (“all four mice” as reference)? Does ambient temperature interact with the difference?
Does age interact with the difference in outcome between the separation patterns?

I have considered a random-intercept model with the Gamma likelihood (log link). Yet, a first-order Markov transition model with random intercept might better account for the correlation pattern than the first option. Note that, depending on the cohort, the baseline measurement is at different dates.

How would you model this data?

PS: I have more experience with `brms` than `rstan`, but might consider using the latter if a more complex model is necessary.

1 Like

You might want to consider `cmdstanr`—it’s easier to install and maintain and keep up to date with Stan than `rstan`.

You can do that, but why a gamma likelihood other than, say, a lognormal? The nice part about lognormal is that it’s easier to interpret the parameters, with log median and multiplicative error scale. And it’s easier to formulate a non-centered parameterization as a normal with an `exp()` transform.

Fit them both and use leave-one-out cross-validation to test to see which is better. I’m not sure what you mean by a Markov model with a random intercept—you mean putting something like a random-walk prior on a sequence of parameters?

Another option is to try to fit some kind of parametric curve to time, given that you have a time series of observations.

1 Like

Would you suggest any reference applying it?

Nice idea, thanks.

See here further details on why I considered applying a Markov model (re @harrelfe) Statistical Thinking - Longitudinal Data: Think Serial Correlation First, Random Effects Second