Weighting observations in quantile regression (for brms)

I’m working with a bird migration dataset (birds counted per day during autumn at a a monitoring site across several decades) and I’d like to understand the yearly rate of change in migration timing for different species. One approach in the phenology literature for analyzing these changes in timing is quantile regression, where different quantiles may represent different segments of a population (for example, the 0.1 quantile might correspond with juvenile or non-breeding individuals that migrate earlier in the season).

In most examples I can find, daily counts are standardized (i.e., the same amount of effort is expended each day to count individuals) and the response variable is day-of-year. If 1 bird is counted on day 100 and 3 birds are counted on day 101 in year 1 then the data looks like this, with each row and doy representing one individual:

day-of-year year
100         1
101         1
101         1
101         1
... 

And so on for each year (also, counts are not usually every day because of inclement weather on some days).

I’m working with a dataset where survey effort is not standardized, and the number of hours spent counting each day vary. Instead, my data looks like this and I’ve also calculated a metric for birds counted per hour (count_per_hour) on each day:

day-of-year count hours count_per_hour
100         8     4     2 
101         20    8     2.5

Or visually for the entire data set (for one species):
Rplot

My question is would it still be suitable to use this approach, but weight the day-of-year (doy) response by birds counted per hour? This is what I had in mind:

fit <- brm(
  bf(doy|weights(count_per_hour) ~ year), quantile = 0.1), 
  data = dat, 
  family = asym_laplace()
)

But I’m not sure if this would be the correct way to specify and account for the varying daily survey effort in the dataset.

Thanks,
Jay

1 Like

Maybe thinking in terms of how the observation effort is being modeled can help you formulate that modified version.

If you are working with integer hour values (N) you can think of a binomial distribution, that would scale your expected number of counts (k) by the observation effort for a same observation probability (k \sim Binomial(N,p) ). Since some counts are more frequent than one per hour, you could use minutes or seconds because you have a probability p \in (0,1).

The binomial approaches a normal distribution for large N, so that probably means you could scale the mean of the distribution you are using to what Np would be in that formulation. If that checks out it seems to me like you’d be scaling by the number of hours, not counts-per-hour, though, I’m not completely sure without looking at the model more closely.

1 Like

Hi Jay,
@caseyyoungflesh and I have done some Stan modeling of bird passage phenology using unstandardized survey data, and I’d be happy to try to answer any questions you have in detail.

A critical question is what the source of effort variability is. In many cases, effort might be variable precisely because counters stay out longer on days with big flights, or leave early on days with rain. If the variation in effort is related to the variation in the rate of (observable) passage on that day, then a lot of care will be necessary to disentangle the two.

Edit: alternatively, some well-staffed (or well-volunteered) migration counts routinely and daily stay out until the bulk of the passage has finished for the day. Thus, on a fall hawk-watch effort might vary primarily because afternoon winds have shifted to strong southerlies, or effort on passerine morning-flight counts might vary primarily because the flight winds down much earlier on some days than others. In such cases, it’s probably safe to ignore the variation in effort and to treat the total number for a day as a good approximation for the total passage for the entire day, even if the effort was just a few hours.

Cheers
Jacob

1 Like

Hi Jacob,

Really nice paper, thanks for sharing. To provide some more detail on effort variability, the count data I’m working with are from fall hawkwatch sites, and typically counts last for 8 hours per day. Shorter days usually only occur where the weather either becomes unsafe (lightning) or reduces visibility (rain/snow/fog), or very rarely if there’s a fire nearby or some other emergency. So in most of those cases, I would imagine the flight drops off with the weather taking a turn, and those few hours spent counting are a good approximation for the entire day. Crews will count for more than 8 hours if the flight is good on that particular day…it seems like the bulk of counts do fall within a ~8-10 hour range:
effort

So generally aiming for 8 hours, counting beyond that if birds are still passing through, and only limiting counts as the weather dictates. So maybe not the end of the world to ignore variation in effort…my sense is that’s the counting situation you were picturing.

Thanks for the input,
Jay

My sense from well-staffed hawkwatch sites is that on any given day they’re getting the majority of the observable birds. This is less true of minor sites, but if you’re talking about major sites like Kittatinny Mtn, Cape May, Duluth, Key West, etc then I’d just completely ignore the variation in effort on the assumption that the counters are not leaving their post if there is still a visible passage of hawks.