Heckman selection model code + simulation

saudiwin · August 5, 2019, 1:52pm

Just somewhat related, but I’m working with a similar (though more simple) model, and I found that identification/reliability improved considerably once I had covariates that could predict the probabilities. Might consider adding that to the simulation to see if it makes a difference.

Max_Mantei · August 5, 2019, 3:06pm

Hi Rachael!

I think everything @rtrangucci posts is correct. And yes, as far as I can see the problem I mentioned is addressed in his approach (in a quite elegant way).

Good luck with your paper!
Max

jeremy.koster · September 24, 2020, 2:18pm

@bgoodri

Two years later and a search of the forum turned up this post. I’d be curious to know if there are further insights at this point, as the use of the inverse Mills ratio seems highly variable across fields and applications.

bgoodri · September 24, 2020, 2:47pm

Not that I know of.

jeffa · October 13, 2020, 8:20am

Just adding a bump here. I am excited to see the extension of these econometric models into Stan.

RachaelMeager · May 7, 2021, 5:08pm

Hi everyone! I found a small bug in the model posted by @rtrangucci . By checking the likelihood against both the stata documentation here (https://www.stata.com/manuals15/rheckman.pdf) and the R package documentation here (https://cran.r-project.org/web/packages/sampleSelection/vignettes/selection.pdf) I think found that there was an unneccessary second division by (sqrt(1-rho)) in the model code above.

Of course, please do reply if you disagree or if you think I’ve misunderstood!

It took me a while to find the bug because it somehow does not affect the performance too badly in the model – in fact, the calibration is still good in the model above! I am not sure why. But I do know that if you try to generalize the model to one in which you can observe the unselected units (but the betas differ across the two types of units) you start getting bad behaviour then.

Below is the fixed model and the generalization (which is sometimes called a tobit-5, but sometimes not, alas) plus R scripts for some simulation-based calibration tests that show they do well :)
fake_data_generalized_heck_montecarlo_calibration.R (4.0 KB)
fake_data_heck_montecarlo_calibration_check.R (2.8 KB)
generalized_heck.stan (1.8 KB)
heck.stan (1.1 KB)

rtrangucci · May 7, 2021, 5:41pm

You’re absolutely right @RachaelMeager! So sorry about that! That’s a nasty bug, it sounds like a pain to track down. @martinmodrak is there a way to edit very old posts? it’d be nice to correct that bug and/or to point people to the right code in @RachaelMeager’s post.

James_Savage · May 7, 2021, 5:45pm

Thanks @RachaelMeager ! I recall @edjee showing me some code he’d written for dynamic panels also. Are you able to post it here Ed?

martinmodrak · May 7, 2021, 6:06pm

I wasn’t aware this is not possible! (as an admin the system lets me do anything). It turns there is a default setting to prevent that for regular users, but I think I trust our user base enough to let anybody on Truest level 2 or above to edit their posts at any time. So you should be able to edit now. If not, let me know (possibly in a private message to avoid derailing the thread).

edjee · May 7, 2021, 6:09pm

My code just uses MVN sufficient stats to speed up likelihood evaluation a lot, it’s agnostic about covariate choice so setting lagged Y as a control gives the dynamic panel model.

Unfortunately, it’s still a mess but I’m cleaning/working on it this summer with a view to sharing.

@RachaelMeager taught me the trick so she gets double discourse brownie points.

James_Savage · May 7, 2021, 6:14pm

In any case, I tagged you in completely the wrong old thread–clearly projected to the same node in my brain by the presence of @rtrangucci

RachaelMeager · May 10, 2021, 5:24pm

no apologies needed – your code saved me SO much time over the life of this project that I am and always will be filled with gratitude for you!! We all have bugs, and your code is so beautiful and clean that it was easy to find once i went back to the algebra (which of course i consider a last resort lol), so all’s well that ends well. :)

rtrangucci · May 11, 2021, 5:39pm

You’re too kind :) I’m so glad to hear that it’s been useful to your project despite the bug!!

edjee · June 19, 2022, 8:10am

Apologies for reviving an old thread.

Rachael’s simulated DGP in fake_data_heck_montecarlo_calibration_check.R doesn’t actually introduce sample selection bias since X_sel and X_out are independent of each other and there’s no intercept (this blog post has more details The Heckman Sample Selection Model | Rob Hicks).

Just thought it worth flagging in case anyone else, like me, came across this thread and couldn’t figure out why OLS was doing so well. The model is still well calibrated when we introduce correlation across the Xs or an intercept.

nipnipj · March 9, 2023, 1:39am

Say we use 2 Heckman selection models (4 equations) at the same time. Can (How) we correlate these 4 equations witch each other?

Topic		Replies	Views
Heckman sample selection model (working but biased selection coefficients) Modeling	16	4301	October 22, 2017
Hierarchical Heckman-style Selection Models Modeling fitting-issues , hierarchical-model	9	1679	May 29, 2021
Heckman selection model Modeling	7	1415	August 11, 2017
Remove/estimate correlation of parameters (or get to know if correlation is real) Modeling	34	3091	April 16, 2020
Correlation between bias and true parameter values Modeling	6	1590	July 9, 2020

Heckman selection model code + simulation

Related topics