How to fit a nested hierarchical model?

I have been using ‘brms’ for a while now and slowly increasing the complexity of my models. I would like some insight into whether the specification of my model is correct to answer my question.

I have hierarchical data with four nested variables and one response variable.

act = nightly bat activity (response variable)
Site = first level (four sites)
Month = second level (nested within Site) (12 months)
Hour = third level (nested within Site and Months) (15 hours)
Wind speed = fifth level (nested within Site, Month, and Hour)


My ‘brms’ specification looks like this.

brm(data = dt1,
             family = negbinomial(),
             act ~ (1 | Location / Month / Hour / scale_Wind.sp),
             prior = model.prior,
             iter = 2000, warmup = 500, cores = 4, chains = 4, backend = "cmdstanr",
             control = list(adapt_delta = 0.99, max_treedepth = 15), 
             threads = threading(2))


My question is: What is the relationship between hour of night and wind speed on bat activity?

Is my model specification correct for my question?

Thanks in advance for your insight.

Hello @bhopalstiffs.

It would help us help you, if you provided some more information on your data or provided a dummy dataset that captures their structure. One thing that confuses me is that it looks like you are using wind speed as a categorical predictor.

Here are some things to consider:

If you restrict your question to a single location-month-hour combination, is a model of the form
act ~ scale_Wind.sp adequate in that situation? Is this relationship reasonably linear?

If you then expand this to several hours*, aren’t the observations of adjacent hours more correlated than those far apart?

If activity varies by month and location, shouldn’t these group-level predictors be crossed rather than nested?

*this is another question about the structure of your data: aren’t observations clustered within days?

Sorry, yes this question needed a lot more background. Here is some code to simulate the data.

set.seed(123)  # Set seed for reproducibility

# Constants
locations <- 4
months <- 12
hours_per_night <- 15
nights_per_month <- 30  # Assume 30 days in each month for simplicity
total_hours <- hours_per_night * nights_per_month * months * locations

# Generate simulated data
data <- data.frame(
  Location = rep(paste("Location", 1:locations), each = total_hours),
  Month = rep(rep(1:months, each = hours_per_night * nights_per_month), times = locations),
  Hour = rep(rep(1:hours_per_night, each = nights_per_month), times = locations * months),
  Night = rep(rep(1:nights_per_month, each = hours_per_night), times = locations * months),
  WindSpeed = rnorm(total_hours, mean = 7, sd = 3),  # Wind speed in km/h, generated as a continuous variable
  BatActivity = NA  # Placeholder for bat activity counts
)

# Ensure wind speed is positive (if needed, we can take the absolute value)
data$WindSpeed <- abs(data$WindSpeed)

# Simulate Bat Activity based on Wind Speed (with noise)
for (i in 1:nrow(data)) {
  wind_speed <- data$WindSpeed[i]
  # Assuming bat activity inversely related to wind speed, with some noise
  bat_activity <- max(0, round(rnorm(1, mean = 50 - wind_speed * 3, sd = 10)))  # 50 - wind_speed * 3
  data$BatActivity[i] <- bat_activity
}

Before I answer your questions, I want to clarify my objective. There are two influences on bat activity and they appear to be wind speed and time of night (hour). I would like to tease out the relationship between those, and to understand their relative importance and/or how they interact.

My expectations are:

1: I Expect higher bat activity at certain times of night unless
2: Wind speed is high

Is it this simple? For example, if the early evening is windy, do bats shift feeding to the usual low activity time of 2300-0200?

Answer your questions:

  1. Yes, wind speed is a continuous variable.
  2. I want to expand my question to all hours sampled, not just a single location-month-hour combination.
  3. Yes, observations of adjacent hours are more correlated than those that are far apart. How do I model this?
  4. Activity varies by month and location. I think they should be crossed and not nested. I would do that with a “:” correct?
  5. We didn’t include a day variable but we could. I’m sure there is daily variation within Months.

I hope this clears things up. Thanks for your help.

The way you specified your model act ~ (1 | Location / Month / Hour / scale_Wind.sp), windspeed is treated as a categorical predictor with as many levels as distinct values of speed. You can confirm this with ranef(). So, whatever else goes into your model, speed should be a population level continuous predictor.

activity ~ speed + [other terms]

I understand that, but you have a very complex dataset. I would start from the simplest possible model (fit on a subset of data for which this simple model is appropriate) and work my way upwards.

Have a look at ?brms::ar.

Ignoring other terms, the simplest crossed specification for Month and Location would be

Activity ~ 1 + (1|Month) + (1|Location)

so, activity varies by month and location. Adding + (1|Month:Location) to the above formula indicates that the effect of month is location dependent.

I do not think you can ignore the clustering of observations within days.

Again, I would start simple, restricting things to a single location and month, and consider the following models:

activity ~ 1 + hour + wind_speed + hour:wind_speed + (1|day)
activity ~ 1 + hour + wind_speed + hour:wind_speed + (1|day) + ar(time = hour, gr = day)

(with hour is a categorical predictor)

EDIT

I hope everything I said above helps with your question re group-level predictors.

However, I think your biggest challenge in answering your question is not grouping. You want to estimate how much daily activity decreases with wind but also whether wind shifts the distribution of daily activity to later hours. I think the latter question, which seems to be your main focus, requires special handling and I certainly cannot help with that. I would suggest opening a topic that focuses on that specific aspect, with some information about the distribution of activity during the night. Is it unimodal or does it peak at dusk and dawn? Since your data are year-round and across different locations another relevant issue is latitude: does the duration of night vary across months?

2 Likes

Thanks for the extensive help. After your comment about distribution of activity during the night, I remembered that activity during the night is not unimodal but it has two peaks, one at dusk and one at dawn(smaller).

I coded a simple model with hour as a spline term:

act ~ 1 + s(Hour) + scale_Wind.sp + s(Hour):scale_Wind.sp + (1 | Date)

However, I get the error ‘s(Hour):scale_Wind.sp’ in invalid brms syntax. I’ve looked around and haven’t seen a correct way to specify this interaction term. Is there a way to specify the interaction between a spline term and a linear term?

To answer your question about latitude. These locations are relatively close together. They are on a wind farm that is several kilometers wide and long.

I think what you are after is a varying coefficient spline, such that \mu = \alpha + \beta(\text{Hour})\times\text{scale_Wind.sp}+\dots… It is specified like this: s(Hour, by = scale_Wind.sp)

Thanks to both @amynang and @StaffanBetner for their help. I will follow up with their comments and post my results.