Suggestions on including prior information to estimate bird territories

Hello, fellas. I am an ornithologist currently trying to fit a model on the territory sizes of a species of bird (call it A). When I mapped 20 territories of these birds I noticed that some territories were defended by pairs and others by trios (i.e., two and three birds). Though my main goal was only to report the territory sizes, I have been thinking about running a ‘simple’ model in brms in the form:

model <- brm(territory_size ~ 1 + number_of_birds,
  data = df, 
  family = gaussian(),
  iter = 40000, warmup = 20000, chains = 4, cores = 4)

However, I have prior information on the territory sizes of two closely related species (call them B and C), and I want to include this information in the model. The issue comes in how to proceed about it. Other studies have reported average territory sizes of 13.3 km^2 and 42.5 km^2 for species B and median territory size of 19.5 km^2 for species C. Based on this information I decided to set a prior with a value somewhere in between the reported ones: 21.0 km^2, with standard deviation wide enough to allow to a range of values: 8.0 km^2. The problem I have is how to include this in the previous model. So far my approach has been to run:

prior <- prior(normal(21000, 8000), class = "Intercept") #values in square meters

model <- brm(territory_size ~ 1 + number_of_birds,
  data = df, 
  family = gaussian(),
  iter = 40000, warmup = 20000, chains = 4, cores = 4)

I run prior predictive checks on this and they made sense. The model runs just fine. For practical purposes this model leads me to the same decision that I would have with a flat prior: the difference between territories with two and three birds is similar to zero.
My question to you is: Is this a correct way to include prior information in the model?
And a side question, would you suggest anything about how to choose the values that should be used for the prior? I know there is not such a thing as ‘the correct prior’, but as of now choosing values seems to be a mix of domain knowledge and a hunch.

This is the data:
territories_areas_feb10.csv (435 Bytes)

Any help would be greatly appreciated. Any comment regarding a flagrant mistake in my model and reasoning is also very much welcomed.

1 Like

Howdy! Ornithologist sounds cool. I like to watch birds, so I’ll give a couple of pointers, although there are probably some other quantitatively minded ornithologists on this forum who would have pointers.

For information on prior modeling, I strongly recommend Michael Betancourt’s discussion Prior Modeling where he particularly focuses on soft containment prior modeling. It’s excellent, and he draws a lot of inspiration from the work of IJ Good, that are then updated to modern modeling.

Specifically for your model here, I think it is a bit odd to use a Gaussian model when territory_size is positively constrained area in square kilometers (and it does not appear that you have centered it or transformed in any way based on your description of your prior). Perhaps it would be more appropriate to choose a family that is positively constrained. The intercept in your model is the value of territory_size when number_of_birds is zero. This doesn’t make a whole lot of sense to me, because if there are no birds, then there is no territory. Based on your description, it sounds like the number of birds in each territory is either a pair or trio and nothing more, so perhaps it could make more sense to treat this as a categorical variable and estimate the territory size for these two options (unless you have more information and there could actually be some continuous count of birds in each territory and you want to model some functional relationship between the increasing number of birds and territory size). If you go the categorical route, then you can place the prior directly on the parameter for the pair or the trio (using index rather than indicator variables) and exclude an intercept (thus you would be placing the prior on the territory size for the pair and for the trio). I would decide on the prior using the approach laid out in Betancourt’s case study that I linked. Something along the lines of

prior <- prior(normal(9.95, 0.25), class = "b") #appropriate prior here

model <- brm(territory_size ~ 0 + birds,
  data = df, 
  family = lognormal(),
  iter = 40000, warmup = 20000, chains = 4, cores = 4)
2 Likes

I agree. You could use a half-normal, but what you want to think about is errors. If you use a normal model, errors are additive at the scale of the normal. I you use a lognormal model, which is constrained to positive values, the scale determines multiplicative errors (it’s just because you take a normal and exponentiate, and exp(y + sigma) = exp(y) * exp(sigma).

That sounds like the opposite of a flat prior if you’re strongly shrinking values to zero.

The simplest thing to do to start is choose weakly informative priors that inform the scale of the answer. So if you have a parameter you expect to take on values like -2 or 3, use a normal(0, 2) prior, if you expect it to take on values like 100, 110, 105, etc., use a normal(105, 10) prior, etc.

Your model isn’t using species information that I can see. If the species have different sizes, then you want an effect for species, too. You can then look at the posterior for the number_of_birds coefficient and see if it makes a difference. Maybe it only makes a difference for one species, so you can build an interaction between species and number of birds. If the number of birds is only 2 or 3, it might make sense to have it be a random intercept rather than a slope, i.e., 1 | number_of_birds.

1 Like