Modeling event count data

rsteckel · December 12, 2017, 3:32pm

I would like to use Stan for modeling daily multivariate count data of events per user of an application (i.e. number of reads, creates, updates, etc.). My end goal is to detect anomalous users, for example, a user that has 20 “create” events when the mean is 2 or 3.

There are a few constraints that make it challenging:

I need a single probability to “score” a user as anomalous for a number of events (15-20).
This suggests to me that simple poisson or negative binomial outputs won’t do. They would need to be combined in some way.
There is some covariance structure between all the events I’m modeling (they’re not independent).
This suggests to me that a multinomial output would not reflect the observed values.
There exist “groups” of users where one group’s counts of events might not be anomalous, but the same counts would be flagged as an anomaly for another group.
This suggests that a hierarchical model would be appropriate

Does anyone have suggestions on how to model this?

Krzysztof_Sakrejda · December 12, 2017, 5:02pm

I’d aim for an unobserved multivariate normal that’s log-transformed to produce intensities that you can model further (adding covariance and hierarchy) and then used with a poisson (or similar) to relate it to the actual counts. You don’t really need to label the user as anomalous at this stage, you need to produce some output that can be used in such scoring but until you settle on the cost/consequences/benefits of flagging users you can’t really decide what the criteria should be.

rsteckel · December 12, 2017, 5:44pm

That sounds like a great approach. Thank you.

Krzysztof_Sakrejda · December 12, 2017, 6:08pm

Always happy to give suggestions I don’t have to implement and check. I think this could be a great way to ID twitter bots but somehow I don’t have the time on my hands to do it and the meat of the problem is in how you model the MVN anyway :)

Topic		Replies	Views
Slight specification changes resulting in warnings and unexpected results Modeling	10	1068	April 6, 2020
Advice for a novice in hierarchical modelling with Stan Modeling	5	706	May 7, 2020
Modeling outliers with multi-level model fails General	1	398	October 2, 2020
Model Advice Modeling	5	407	November 15, 2019
Modelling the indicator that a sum of Bernoulli random variables is positive Modeling	2	384	January 3, 2021

Modeling event count data

Related topics