How to model predictor obtained at multiple times

Hi,

I hope I can get some input/hints here for how to best model a certain problem:

I have a predictor variable obtained at 4 time points for 70 individuals and the outcome at one time point.

In class we only ever fitted multilevel models where the “individual” was the “grouping variable” meaning that we estimated different intercepts and slopes per individial. In that case, the dependent variable was the one that was measured many times. But how can I model my situation. You find my questions under 3., 4. and 5.

  1. I know I can model 4 separate models like this outcome \sim predictor + covariate. But then I think I loose power + test multiple times.
  2. I could model like this: outcome \sim time \times predictor + covariates In that case I would need to look at whether the interaction terms are as expected (e.g. 95% above/below 0).
  3. Could I also model it like this: outcome \sim predictor + (predictor | time) ? Then I would look at the association of predictor to outcome globally and whether that differs for the different time points. That might work for my specific case. But more generally: If an association is only expected at timepoint 1 and not 3, then maybe the partial pooling will decrease power for the time point one, is that correct thinking?
  4. If I wanted to model per individual, would it be appropriate to model it like this: predictor \sim outcome \times time + covariates + (outcome \times time + covariates | individual) ? Thus, can you reverse predictor and outcome since at the end of the day we model association and not causation anyways?
  5. Is here a minimum or maximum value for the number of categories the grouping variable must have. E.g. are 3 or 4 time points enough ?

found 4. counterintuitive to model but my supervisor told me that is how I should do it. I am sceptical and would be really grateful to hear from experienced users how to solve this…

If I understand correctly, then

Would be great if anyone could help me out…

This is an interesting problem I haven’t seen before, but I’ll venture a suggestion: would it make sense to break things up into two stages such that you have latent_state ~ predictor then outcome ~ latent_state, where latent_state is itself a parameter?

1 Like

Also, perhaps you could tell us a litte more about what kind of process or relationship you are trying to model?

2 Likes

Oops, on further reflection, I’d model this as predictor ~ latent_state + time then outcome ~ latent_state, with whatever structure to the influence of time on predictor you feel is justified (ex. linear effect of time? Effect of time varying across individuals, modelled hierarchically? etc)

1 Like

@mike-lawrence and @erognli:
thank you so much for contributing. I am sorry for not replying but I have been very ill for a long time and could not work on my project.

To answer your question:

y = The memory of 70 children has been tested at age 10 years.
x = Diversity of gut microbial composition has been accessed at several time points in infancy and at age 8 years. Thus we have 5 different scores of diversity obtained at different time points all before the outcome was measured.

The gut microbiota influences brain development and also current brain functioning and therefore my idea was that I could model it with a multilevel model like I posted above:

memory_score ~ 1 + diversity_score + (1 + diversity score | diversity_time)

Since the microbiota between infancy and adulthood is very different I also thought about creating a dummy variable that indicates adulthood so that I get different slopes for diversity scores from adulthood:

memory_score ~ 1 + diversity_score * adulthood + (1 + diversity score | diversity_time)

But where I am stuck is the idea that memory scores do not vary by time (thus a random intercept makes no sense, thus:

memory_score ~ 1 + diversity_score * adulthood + (diversity score | diversity_time)

But also the model might “think” that memory scores have 4x higher sample size as there actually is because of the long format of the data. I have difficulties to wrap my head around how I can best model this to avoid multiple testing and my intuition tells me a multilevel model must work somehow.

Judging from what you describe, I think @mike-lawrence had a good idea about linking these through some latent state parameter.

Could you state your research question as well? It might be useful to make a few different models, under different theoretical assumptions about the association, and compare these.

Do you expect the influence of microbiota to be constant across development, or could there be sensitive periods? Are you sure of an approximately linear relationship between diversity and memory, or could it be curvilinear?

2 Likes

Thanks so much for giving suggestions. This research is very much exploratory. We do not expect yet any direction of this effect but I would expect that the model somehow should reflect that the influence of diversity in infancy might be different than the influence in adulthood. Thus, no, I do not expect that the influence is constant across age. Infancy is a sensitive period. But also one could argue that the mechanism might change by which the microbiota affects the brain. E.g. early life it is more a programming effect and the closer you get to the measurement of the outcome, it might be more how it affects (brain-)metabolism.

Currently, research suggest that higher than average diversity in infancy is a sign of too early microbiota-maturation (often induced by events such as c-section or no breastfeeding). I tried to make the model flexible in that regard by using the adults (yes/no) variable but I drop that idea for now based on your comments.

I am not sure of a linear relationship. It might be that either too high or too low diversity is not good. It might be that what is good and what not is dependent on infancy vs adulthood as explained above (after 3 years of age you can speak of an “adult” microbiome) and then very low diversity could be interpreted as less stable system or lack of metabolic flexibility.

I am not sure if I understand the idea yet. So according to @mike-lawrence I would need to do:

diversity ~ latent_state + time

then:

memory ~ latent_state

That means that in the first model one of my predictors is a parameter. I never did something like that but I assume if I specify the predictor as parameter in e.g. rstan or brms, then this gets estimates from the data. And then I use that in the final model with the outcome.

But it does not click yet for me to be honest.

So I have 70 outcome values.
I have the diversity for 4 different time points --> 280 values.

then if I did that first model I get a posterior with 280 rows/colums. If I then do the final model, I again would use each outcome value 4x whereas it was obtained only once. So I do not account for repeated measures of predictor.

Is there maybe a good text about such latent state models for beginners to intermediate? My level is having absolved Statistical Rethinking twice. So I am good with standard regression models and some GLMs including multilevel models.

1 Like

You can think of this as falling in the general class of structural equation modelling.

Yup! The sampling of the latent parameter gets constrained by how well it predicts the actual observed outcomes.

Oh! No, you want to just do it all in one model simultaneously. Passing the results from one model as inputs to a next model is super difficult and simply unnecessary.

3 Likes

When you are doing exploratory work, I think it’s particularly useful to fit and compare different models. Then you could compare a model with an assumption of equal association across age with ones allowing for it to differ in various ways, for instance. I’d try to come up with a (not too large) set of theoretically relevant different models of this association, I think.

2 Likes