Modeling noisy indicator of a ratio

tpapp · January 30, 2026, 12:39pm

I have a survey dataset where individuals i have various sources of non-wage income (Y_i), including unemployment benefits (U_i).

I do get to observe Y_i, but not U_i. But there is a question on the survey asking for people’s main source of other income, so in theory I could make a boolean outcome u_i = 1_{\{ U_i/Y_i \ge 0.5 \}}.

But I want to make this noisy because I don’t trust people answering this precisely. I thought of the following: let

r_i = U_i/Y_i \in [0,1]

be a latent variable, and define the mapping

w(y) = \frac{y / \sqrt{1 + y^2} + 1}{2}

which maps (-\infty, \infty) to (0, 1), then define

\Pr(u_i = 1; \kappa) = w(\mathrm{logit}(r_i) / \kappa)

This gives me a smooth approximation to the Heaviside function on a finite domain:

(I have \dots/\kappa because I want to put a finite-domain prior on \kappa).

I can code this in Stan just fine (I have to special-case the edges for 0 and 1, but it works), but it seems to be a pretty ad-hoc approach of transformations I just cobbled together, so I am wondering it there is something more canonical.

ruarai · February 2, 2026, 1:02am

Can you use the inverse logit in place of w? Then kappa is just acting directly on the log-odds of r.

tpapp · February 2, 2026, 3:35pm

That’s a great idea, thanks! Then I don’t have to special-case the edges either in the code. The corresponding plot is

Bob_Carpenter · February 4, 2026, 10:00pm

Did the survey ask people if they earned more than half as much from unemployment benefits as they earned from non-wage income?

Or did you just ask them that and you’re somehow interested in this cutoff for some other reason? Do you need that indicator exactly or is it just a convenient summary?

I believe the advice that @andrewgelman usually provides in this situation is to just plot the raw fits, e.g., U vs. Y in a scatterplot with a line corresponding to the 0.5 ratio you care about.

andrewgelman · February 5, 2026, 1:06am

Hi, I’m not following all the details here, but if I’m reading things right, my suggestion would just be to model the unobserved variable u. There’s no need for the logits or this other stuff, you can just do inference for u, get your posterior simulations, and then summarize however you want.

Topic		Replies	Views
Including a probability as predictor for logistic regression Modeling	1	458	January 11, 2019
Modeling Cutpoint for Noisy Covariate Modeling specification	4	213	June 27, 2024
Logistic regression with imperfect measurement of the outcome Modeling specification	11	856	September 6, 2019
Is it ok to use the same independent outcome variable twice in a model? Modeling rstan , techniques	21	817	June 4, 2024
Error-in-variables regression with unobserved discrete predictors Modeling specification	11	1354	March 2, 2018

Modeling noisy indicator of a ratio

Related topics