Hello everyone, as the title says I’m quite new to multilevel modeling, and more specifically applying a Bayesian approach using Stan. I’ve been reading sections of Gelman’s book (2006), internet resources as well as case studies from Stan’s website. The aim of this post is, initially, to explain the definition of a modeling problem to know your opinion -more experienced than mine- about if it makes sense what I want to achieve and if this is a valid mechanism for it (multilevel modelling with Stan). If this is the case and if you recommend me some additional resource to guide me, I would be very grateful and maybe one day I will come back with a modelling proposal and more concrete doubts to share.
First of all, I want to say that the problem is a simple exercise that I have taken as a motivation to go deeper in learning new methodologies for me (multilevel causal modeling with Stan). Please take it as such even if it seems to you a problem with an unrealistic goal.
Inside a large building dedicated to research there are people-counting sensors at certain locations (cross-sectional data) that are processed at a certain frequency (30 min, thus panel data in the end). There is no person recognition in the counting, so they are just aggregate flows. The aim of the research design is to determine whether change is observed after the implementation of telework measures and subsequently after the return to “normal”. Therefore I had thought of a piecewise growth model where the points of change are the specific days of implementation of measures and thus be able to quantify the changes taking into account the between-sensor variability (partial pooling). In this way, level 1 would be the daily repeated measures and level 2 the sensors/locations. Does it make sense? The thing is that I haven’t found it very common to proceed with a piecewise growth in multilevel/Stan.
On the other hand, the daily measures are a timeseries with 288 points (5 min). I consider a bit unfeasible making a multivariate approach (https://mc-stan.org/docs/2_20/stan-users-guide/multivariate-outcomes.html). However, I would be still interested in being able to detect change according to the time of day, mostly to check if ‘back to normality’ is the same for different timeslots. Therefore, I think about making groups (say 2/3 hours), summarizing and (1) checking the between-timeslot variance in a 3-levels model, or (2) fit independent 2-level models for each timeslot with a single univariate sumarized measure, or unsummarized multivariate. Do you consider one alternative more formal than another?
Finally, I know beforehand that there is weekly seasonality since for example on Saturday or Sunday there will be a lower flow. However, I have doubts about how to control this effect so that it does not affect the piecewise growth model. Would it be through group level predictors?
As you can see, some doubts are narrowed down to a level more of modeling than making fit in Stan, but I think that in this forum there are people very experienced with this kind of models and also it is my interest to use a Bayesian approach with Stan. My apologies if the forum is exclusively for doubts that already include the model proposed in Stan. Any kind of help or advice will be well appreciated!