Custom function to reallocate probability mass from certain integers to others

dholstius · May 1, 2024, 11:01pm

Summary

I’m new to Stan and having some success modeling with brms. This is mostly a Stan question with some brms. Thank you in advance for your patience!

I’d like to fit a custom distribution that behaves like a zero-inflated negative binomial, but with one extra feature: preferential snapping to certain “humanized” integers. For this, I’m wondering if a custom function will work.

Details

My hypothesis is that the data-generating process looks like (say) a negative binomial, except that there is a “human reporting” transform layered on top, such that numbers like 9, 11, 13 are instead reported as “10” or “a dozen” (i.e., 12). This would happen if the cognitive process is something like “search from left to right for a satisfactory-enough answer that is among the set 1, 2, 3 … 7, 8, 10, 12, 15, 20, 25, 30.”

I don’t have anything rigorous to back this up, but I think the data are consistent with this hypothetical data-generating process: see figure below. And at the end of the day, I’m just trying to build a model with good predictive performance that isn’t a black box.

In the figure below, you can see that there are peaks at numbers like 10, 12, 15, and 20. There are also troughs at neighboring numbers. Why are these numbers missing? I think the mass is being shifted around by the human in the loop. Thus, I do not want to model this as just having inflation at preferred numbers, because of the troughs. The peaks and troughs are consistent, even as the conditional distribution shifts rightward with increasing reported frequency (across the panels, from left to right and top to bottom).

How should I try to code this in Stan? Given the Stan code, I can wrap it in R with brm(bf(..., stanvars = stanvar(scode = stan_funs, block = "functions"))) and a custom_family() following certain helpful examples (thanks @paul.buerkner and @andrewheiss !):

… but I don’t know how to write a correct Stan density function. And certainly not an optimal one!

Wondering if anyone has tried something similar in the past?

Pseudocode something like:

humanize_count <- function (x) {
  preferred <- c(1:8, 10, 12, 15, 20, 25, 30)
  nearest <- function (x) which.min(abs(preferred - x))
  preferred[sapply(x, nearest)]
}

… although the range of my data is small enough that it could be brute-forced with a half-dozen (see?) if-else cases.

Figure

For the data in the figure, the two questions being asked are:

On a typical winter day, how often do you burn wood?
On a typical day that you burn wood, how many logs do you burn?

This is of the subset who have already responded that they have a fireplace and use it. Hence, there are no zeros in the histograms. There may be some one-inflation, but since it’s truncated below 1, I think that can be handled by just subtracting one and … presto, there’s zero-inflation!

P.S. If you want to suggest a better distribution than negative binomial, I’ll owe you another one!

Topic		Replies	Views
Custom family works in Stan but not in brms brms	2	504	June 11, 2022
Issues with Parallel Computation for Custom Beta Binomial Model in brms brms fitting-issues , performance , loo , brms	0	546	November 28, 2021
I am having trouble with the custom family for beta binomial brms	6	745	April 30, 2020
Custom response distribution - fitted stan function for mixture model brms	4	596	November 15, 2018
Custom Item-Response Theory model in brms Modeling specification	1	445	August 24, 2020

Custom function to reallocate probability mass from certain integers to others

Summary

Details

Figure

Related topics