Leverage observations in a regression model

Paul_Lantos · August 13, 2018, 5:34pm

I’ve got a spatial regression model with the general formula:

brm( outcome ~ s( long , lat ) )

I suspect that observations with particularly strong leverage are spatially clustered over the array of long/lat coordinate pairs. What I’d like to do is pull the leverage for each observation from the fit model. Is there any straightforward way to do this?

Here’s the more general problem – I have around 15,000 observations of newborn infants with their gestational age at birth. It appears that on a map there is local variability in gestational age. I want to see if the variability is due to a minor (but statistically detectable) difference in local mean, or if it’s due to a higher local prevalence of extremes (premature births).

Thanks!
Paul

Please also provide the following information in addition to your question:

Operating System: Windows 10
brms Version: 2.3.1

Bob_Carpenter · August 22, 2018, 11:54am

I think @paul.buerkner and @bgoodri missed this the first time around. They’ve been answering most of the brms questions.

paul.buerkner · August 22, 2018, 12:44pm

How would you define the leverage of each observation?

The latter question seems to imply that you also want to predict the variation of the outcome? If you, go for

brm(bf(outcome ~ s(long, lat), sigma ~  s(long, lat)), ...)

Paul_Lantos · August 22, 2018, 5:31pm

Thanks,

So yes predicting the variation would be helpful, but a problem I suspect is that the density of observations is much greater in the urban, high risk areas. As a result the error in the model estimate is lower:

That blue contour in the middle is where mean gestational age has at least 95% probability of being lower than the overall mean – and I suspect it’s because there are extreme values (lots of premature deliveries) pulling down that mean. However, the error is quite low there given the abundance of data in that region. So what I’m looking for is not so much overall variation, but whether the lower mean in that area is because of a higher abundance of extreme values

I don’t know the formal definition of leverage, but my understanding is that with linear regression models (and built in diagnostics with lm() and glm() ) you can identify observations that have disproportionate influence on the estimate.

The top model here is as I ran it initially - the second is trying what you suggested. With the second I get this error:

Error: Expecting a single value when fixing parameters.

model1 <- brm(outcome ~ s( Long , Lat ) ,
data=data ,
family=“Gaussian”,
prior = prior( normal( 0 , 1 ) , class=Intercept ) ,
cores=4 )

model1_sigma <- brm(bf(outcome ~ s( Long , Lat ) , sigma ~ s( Long , Lat ) ,
data=data ,
family=“Gaussian”,
prior = prior( normal( 0 , 1 ) , class=Intercept ) ,
cores=4 ) )

Paul_Lantos · August 22, 2018, 6:16pm

Nevermind, I put the parenthesis in the wrong place.

So after I fit the brm( bf( outcome ~ s( long , lat) , sigma ~ s( long , lat) … ) I should predict (or fitted( ) the sigma to the coordinate grid?)

Thanks,

Paul

paul.buerkner · August 22, 2018, 7:17pm

You can get predicted (or fitted) values for sigma by using the dpar argument. For instance, predict(model1, dpar = "sigma") or directly plot its spline via

marginal_effects(model1, dpar = "sigma", surface = TRUE)

Topic		Replies	Views
Prediction fitted.brmsfit from a model with group level effects using fitted.brmsfit brms	6	1591	February 14, 2022
Losing observation when modelling missing values brms	9	582	April 11, 2019
Spatial lag model with brms and predictions/marginal_effects brms	2	499	June 14, 2019
Mixed effects model accounting for spatial lag: GPS points are at the second level Modeling hierarchical-model , brms	0	55	April 27, 2024
Brms meta-analysis - weighting of individual and aggregate observations Modeling brms	0	412	July 27, 2022

Leverage observations in a regression model

Related Topics