Leverage observations in a regression model

I’ve got a spatial regression model with the general formula:

brm( outcome ~ s( long , lat ) )

I suspect that observations with particularly strong leverage are spatially clustered over the array of long/lat coordinate pairs. What I’d like to do is pull the leverage for each observation from the fit model. Is there any straightforward way to do this?

Here’s the more general problem – I have around 15,000 observations of newborn infants with their gestational age at birth. It appears that on a map there is local variability in gestational age. I want to see if the variability is due to a minor (but statistically detectable) difference in local mean, or if it’s due to a higher local prevalence of extremes (premature births).

Thanks!
Paul

Please also provide the following information in addition to your question:

  • Operating System: Windows 10
  • brms Version: 2.3.1

I think @paul.buerkner and @bgoodri missed this the first time around. They’ve been answering most of the brms questions.

How would you define the leverage of each observation?

The latter question seems to imply that you also want to predict the variation of the outcome? If you, go for

brm(bf(outcome ~ s(long, lat), sigma ~  s(long, lat)), ...)

Thanks,

So yes predicting the variation would be helpful, but a problem I suspect is that the density of observations is much greater in the urban, high risk areas. As a result the error in the model estimate is lower:

That blue contour in the middle is where mean gestational age has at least 95% probability of being lower than the overall mean – and I suspect it’s because there are extreme values (lots of premature deliveries) pulling down that mean. However, the error is quite low there given the abundance of data in that region. So what I’m looking for is not so much overall variation, but whether the lower mean in that area is because of a higher abundance of extreme values

I don’t know the formal definition of leverage, but my understanding is that with linear regression models (and built in diagnostics with lm() and glm() ) you can identify observations that have disproportionate influence on the estimate.

The top model here is as I ran it initially - the second is trying what you suggested. With the second I get this error:

Error: Expecting a single value when fixing parameters.

model1 <- brm(outcome ~ s( Long , Lat ) ,
data=data ,
family=“Gaussian”,
prior = prior( normal( 0 , 1 ) , class=Intercept ) ,
cores=4 )

model1_sigma <- brm(bf(outcome ~ s( Long , Lat ) , sigma ~ s( Long , Lat ) ,
data=data ,
family=“Gaussian”,
prior = prior( normal( 0 , 1 ) , class=Intercept ) ,
cores=4 ) )

Nevermind, I put the parenthesis in the wrong place.

So after I fit the brm( bf( outcome ~ s( long , lat) , sigma ~ s( long , lat) … ) I should predict (or fitted( ) the sigma to the coordinate grid?)

Thanks,

Paul

You can get predicted (or fitted) values for sigma by using the dpar argument. For instance, predict(model1, dpar = "sigma") or directly plot its spline via

marginal_effects(model1, dpar = "sigma", surface = TRUE)