Smoothing measurements within, not across, cycling routes

I have many readings from bicycles that measure how close they are to passing vehicles. I’d like to smooth these readings within routes/locations, but not across routes, in order to determine where there are “hot spots”. Say each measurement I have lat, lon, distance to passing car, and travel direction, plus a bunch of traffic (parked car? Bike lane type? What was the passing car type and what was it doing?) and weather controls.

There are two research questions: first, where are the dangerous spots for cyclists? Second: what road conditions/design parameters make for a dangerous spot? We’ve only got 40 cyclists in each city collecting observations, and these cyclists have their own preferred routes. So we’d like to draw inference about possible hot-spots for parts of each city that we’ve not mapped. Assume that we have very high quality (design and traffic) data for the whole city.

How would you smooth routes? Ideally I’d want a person traveling South on one side of the road to have little baring on the smoothed risk estimate of a person traveling North on the same road. So if there are two roads 200m apart, I want them to not influence each other, but two readings on the same route 200m apart, these might influence each other’s smoothed estimate.

Reasonably large number of roads (say 500) across 10k observations in each of 4 cities. I’d want to be able to control for (about 40 in each city) cyclist fixed effects and a bunch of other things (day of the week effects, time of day effects etc). Ideas, spatial analysis people?


1 Like

It seems like the easiest thing to do would be group the observations by the nearest road, but 10k observations on 500 roads doesn’t seem like a lot.

So the cyclists were passed by 10000 cars during their rides, distributed across 500 roads? That’s 20 car passes per road? Are these block-length road sections? Or long roads? If there short roads then could a hot spot simply be the short section (in which case you could just do hierarchical models on top of the per-road estimates)? Or do you want more resolution (and try to estimate a continuous risk sorta thing)?

What do the hot spots look like if you plot them? Can you do like an x-y plot of observations and color the points by how close the cars were? Do you see lots of clustering? Or is it noisy?

Are bikers being passed a lot at certain points or is the passing distributed around pretty evenly?

Do you have certain roads with tons of data and some without much?