A colleague of mine caught and tagged 57 birds (white storks) with a GPS collar that records the location of the bird and the height it is flying at. One of the goals of the study is to analyse how white storks behave in the vicinity of power lines. Their hypothesis is that white storks will fly higher in the vicinity of power lines to avoid collisions. My colleague started by doing a linear model relating flight height and distance to power lines and add by adding varying effects for individual storks:

I was wondering if there is a better (and Bayesian) approach for this problem that I can recommend to my colleague. It occurred to me that Bayesâ€™s theorem might be useful here.

Power lines are 40 m high and we can assume that for a white stork to fly over a power line it will have to be flying at an altitude > 40 meters within 100 meters of a power line. I thought of formulating the problem like this:

If so, how can I â€śconvertâ€ť this formulation in Bayesian regression model that can be run in STAN? I am asking for a regression model because there are other covariates that may influence bird behaviour in the vicinity of power lines, e.g.: bird age (older birds may be more careful around power lines and fly higher than younger ones) and habitat (power lines in prairies are more visible than on forests).

the first is the outcome in my case is not a probability of sucess but altitude

second, birds can fly high or low during the day depending on what they are doing. Iâ€™m only interested in determining if birds tend to fly above 40 m whenever theyâ€™re close to power lines or not.

Regressing flight height on power line vicinity seems reasonable to me. But before modelling, I would just plot the data. Maybe some histograms of avg. flight height away from power lines compared to avg flight height in vicinity?

If you donâ€™t see anything there, it may be that they just donâ€™t care? I would also suggest looking at individual birds first (again, just by looking at the data). Maybe your 100m threshold is too wide/narrow.

Your colleagueâ€™s model looks reasonable to be, but possibly start with a model with complete pooling first. Or even better, if youâ€™re going the Bayesian route, do a hierarchical model instead.

Ok. So, after fitting a model with partial pooling how should I proceed? Should I calculate height predictions for when birds are within 100 meters from power lines and compare those with heights for when birds are > 100 meters away from powerlines?

First thing to do would be posterior predictive checks to see if your model actually explains your data. If these look well, your quantity of interest would probably be the regression coefficient on power line vicinity.

What you propose sounds like marginal effects and is always a good idea (the marginal_effects() function might be helpful if you are using brms).

Iâ€™m a bit concerned about this, though:

My colleague started by doing a linear model relating flight height and distance to power lines and add by adding varying effects for individual storks:

Trying different models is of course always okay (after all, model building is an iterative process), but you should be able to explain why the previous model didnâ€™t perform as expected and which shortcomings it has.

If you donâ€™t do that you run into whatâ€™s called researcherâ€™s degrees of freedom or garden of forking paths (see e.g. this article https://journals.sagepub.com/doi/full/10.1177/0956797611417632). The article mostly focuses on frequentist statistics, but the same issues also apply for Bayesian models.

What happens is that you basically lie to yourself by applying different models and ultimately chosing the one which produces the desired outcome, which is a trap that is surprisingly easy to fall into.

Yesterday my colleague sent me the exploratory data analysis and the outputs from the frequentist analysis. There seems to be something there after all. I will report back as soon as I can run the analysis myself. After fitting the model the first thing I will do is posterior predictive checks!

Hi There. I have some experience with analyzing white stork gps data.

The flight height variation in storks will probably be a problem for your analysis. The 100 meters of difference that a power lines might make is quite a small contribution. I suggest limiting your analysis to situations in which storks normally donâ€™t fly high, thatâ€™s when there are no updrafts. Assuming that your study area is a relatively flat landcape, so without orographic updrafts, most of the updrafts should be thermal in nature. Therefore I suggest limiting the analysis to early morning and late evening, and during winter possibly larger parts of the days. This way the storks are actually forced to expend additionaly energy to climb over the power lines, thus making this a nice test for whether they actually care about this. You could also limit the analysis to just the first seconds after takeoff, when the storks are still flying low. If your landscape is hilly you also should keep in mind that a powerline that is located in a valley will probably always be overflown even if the storks donâ€™t care about whether the overfly or fly through it, just because of the shortest path from one side of the valley to the other in 3d-space leads over the powerline.

GPS-Tracking data by its very nature exhibits both spatial and temporal autocorrelation. in addition you also have autocorrelation because you data is grouped into individuals. In addition, these individuals might also be genetically or culturally related to various degrees (white storks are often tagged as family groups, and learn socially) . You need to be careful how you set up and interprete the statistics so you donâ€™t run into issues with all these autocorrelations.