I would like to use Stan for modeling daily multivariate count data of events per user of an application (i.e. number of reads, creates, updates, etc.). My end goal is to detect anomalous users, for example, a user that has 20 “create” events when the mean is 2 or 3.
There are a few constraints that make it challenging:
- I need a single probability to “score” a user as anomalous for a number of events (15-20).
This suggests to me that simple poisson or negative binomial outputs won’t do. They would need to be combined in some way. - There is some covariance structure between all the events I’m modeling (they’re not independent).
This suggests to me that a multinomial output would not reflect the observed values. - There exist “groups” of users where one group’s counts of events might not be anomalous, but the same counts would be flagged as an anomaly for another group.
This suggests that a hierarchical model would be appropriate
Does anyone have suggestions on how to model this?