I am working on using STAN to implement a multilevel regression model on repeated measures survey data but the catch is that there are two auxiliary variables and not everyone was measured at every time. Basically, I’m combining two public opinion surveys with repeated measurements. The first auxiliary variable is denoted which time points an observation was measured. No observations are measured at every time point, but there are observations at every time point. The second variable denotes a demographic group. If an individual is measured more than once, the observations for those respondents are correlated and the correlation needs to be accounted for in the likelihood. For a fixed value of both auxiliary variables, the likelihood is normal or multivariate normal. The (co)variance of the error term in the regression depends on demographic characteristics.

Basically, my likelihood looks similar to mixture of normal and multivariate normals with and the distribution each observation belongs to is known. What is the best way to specify this likelihood in stan? I was considering writing a custom likelihood function. But if there some way I can avoid a custom likelihood by using indicator variables and if statements since conditional on auxiliary variables the likelihood is a standard distribution.

1 Like

If I understand correctly, this sounds like a scenario of repeated measures with missing data. Check out this lecture/tutorial on how I’d approach that scenario.

1 Like

This scenario isn’t exactly repeated measures with missing data. Although your approach would work in that scenario. The catch with this data set is that the gap between the waves in the surveys is different. One survey has two waves 6 months apart and the other has six waves one months apart. Also the focus on not on the individuals as much the population values. And the missing data isn’t random. It is missing due to the design of the survey, it’s missing because the person wasn’t in the other survey.