I want to find the number and location of a bunch of teeny, tiny crabs on a beach from pictures a drone took, based on data from human annotators. The thing is, I’m not sure this problem is best solved in Stan.
The way I’ve been getting the data is to have annotators click on the positions of the crabs in a GUI that I’ve made, which records the pixel coordinates of each click.
The underlying inference model I’ve made in my head is thus:
-
There are an unknown number of crabs in each image, although the number is pretty strongly constrained (e.g., greater than half of the lowest number of crabs any annotator reports and less than twice the highest number).
-
Each crab has a single real location
-
The accuracy for each annotator is unknown for each image, but follows some random distribution (not super important which one, I could even hardcode the parameters). By “accuracy” I mean that no annotator is guaranteed to click on each crab in each image. They randomly sample some proportion of the crabs to click on each image.
-
The xy-coordinates of each click have a random Gaussian error that centers on the real location of each crab. I can also hardcode the parameters of this distribution if it makes it easier. It will be a relatively tight Gaussian.
-
Each click per annotator per image corresponds to a unique crab in that annotator’s judgment. (If an annotator clicks on two points very close to each other, they’re not selecting the same crab twice.)
Does it make sense to infer the crabs’ locations with Stan? Or should I use a different method or language? I’ve looked at a variety of clustering algorithms, but they don’t seem very well fit for this problem.
Howdy!
I think to answer this I’d need to know a little more about the higher level goal. It depends on that where Stan fits in.
Are you trying to build a system that automatically finds crabs in images and counts them?
Are the actual locations of the crabs important to you? Or just the counts?
Are you trying to build a system that tracks movement of a bunch of crabs? Is this a time series thing?
Happy crab-hunting!
Yes, this would be a fine problem for Stan, especially if you care about uncertainty. Search the forums of “ecology movement” models which can be used to model the crab population itself. Then you can use a Dawid-Skene-like model to quantify the accuracy of the annotators. Put this all into one big joint model and let Stan do it’s thing.
Not that it will be trivial to put the pieces together, but if you want your awesome Lego spaceship then you have to put the effort in to assemble it.
I don’t think the crabs are moving in the pictures, so the movement HMMs are probably not a good place to look.
You want to think about these types of models generatively, which is close to what you laid out.
Dealing with the annotators is a very traditional noisy measurement problem. There’s an example in the latent discrete parameters chapter of the manual doing it with the Dawid and Skene model. There it’s categorical annotations over a known number of items. The problem is how to tackle the unknown number of items you have and try to figure out when clicks from different annotators are on the same crab.
So your generative process at first order is something like this:
- select number of crabs in image (from some prior)
- for each crab, select a location (from some prior, could be uniform)
- for each annotator, while the annotator thinks there are more crabs to annotate:
- select a crab to annotate
- click on its location with some error
Somehow the selection of crab to annotate will depend on the previous annotations given that they’re trying not to double annotate.
This would wind up being a very hard model to fit because of the uncertainty in number of crabs. You have two explanations for clicks close to each other—error on the part of the annotators or two crabs. Then the combinatorics of the total number of crabs and where they’re at will come in.
I wouldn’t think in terms of the strong constraints (those half ranges), though those would inform the prior on how many crabs each annotator annotates conditioned on the true number of crabs.
I don’t think the crabs are moving in the pictures
Better drink some more then? Ba-doomp-tschhh!
Where I’ve seen this sort of thing before is Poisson Point Processes. I feel like I kinda remember from reading about them that it’s easier to estimate a density of things rather than estimate exact locations and labels. Any time you put labels on things and say This crab was Here, stuff gets hard. Of course that simplification will only work for you if you can somehow get away without labeling Exact Crabs.
The book I read was Poisson Point Processes by Roy Streit (http://www.springer.com/gp/book/9781441969224). I thought that book did a pretty good job of getting me thinking about my problem differently. It’s not really about people annotating images. It’s more about radar/image detector kinda problems. If you have a library nearby get that on ILL. There’s probably some other good stuff you could Google around the internet if not.
The question is really whether the quantity of interest is the spatial distribution, the locations of specific crabs, or just the number of crabs. @burchill wanted to find the number and location, but didn’t mention the spatial distribution.
But if spatial distribution is what really matters, something like a Poisson point process would make sense. You probably wouldn’t even have to worry about double counting other than as a general discount to any observation.
Ah thank you so much guys! I really, really appreciate such immediate and thoughtful replies.
I guess I should have explained myself better–I wanted to think about it in almost exactly the way @Bob_Carpenter described–from my bit of browsing on the topic, it seemed that it might be a bit of a headache to model something where the the number of crabs is uncertain.
For the current purposes, I wanted to see how well I could make a model that would give me information about the pixel positions of the crabs (which would necessitate information on the number of crabs as well).
However, the Poisson point process idea could actually be better down the line for what I’m interested in. The reason these crabs are interesting is because in part of their lifecycle, they all have to cross a beach into the water. Their strategy to cross this beach while minimizing predation is to all stampede at the same time–and since the crabs can’t talk to each other to say “when”, how they all run at the same time is a mystery.
One way this might be triggered is by something like crab density, for which the Poisson models might be very helpful for. I think for right now, I’ll just try and see how well a “quick-and-dirty” hierarchical clustering works, but once I get a better sense of the data and the problem, I might try combining some basic object recognition/HMM stuff with that Poisson modelling (I’ll have to do some research on that) to see how the density changes over time, without having to annotate every single frame.
Thanks again! I have a few other side projects on the burner that I’d like to try to do in Stan, so I’ll definitely be back!