## Summary

I’m new to Stan and having some success modeling with `brms`

. This is mostly a Stan question with some `brms`

. Thank you in advance for your patience!

**I’d like to fit a custom distribution that behaves like a zero-inflated negative binomial, but with one extra feature: preferential snapping to certain “humanized” integers.** For this, I’m wondering if a custom function will work.

## Details

My hypothesis is that the data-generating process looks like (say) a negative binomial, except that there is a “human reporting” transform layered on top, such that numbers like 9, 11, 13 are instead reported as “10” or “a dozen” (i.e., 12). This would happen if the cognitive process is something like “search from left to right for a satisfactory-enough answer that is among the set 1, 2, 3 … 7, 8, 10, 12, 15, 20, 25, 30.”

I don’t have anything rigorous to back this up, but I think the data are consistent with this hypothetical data-generating process: see figure below. And at the end of the day, I’m just trying to build a model with good predictive performance that isn’t a black box.

In the figure below, you can see that there are peaks at numbers like 10, 12, 15, and 20. There are also *troughs* at neighboring numbers. Why are these numbers missing? I think the mass is being shifted around by the human in the loop. Thus, I do not want to model this as just having inflation at preferred numbers, because of the troughs. The peaks and troughs are consistent, even as the conditional distribution shifts rightward with increasing reported frequency (across the panels, from left to right and top to bottom).

How should I try to code this in Stan? Given the Stan code, I can wrap it in R with `brm(bf(..., stanvars = stanvar(scode = stan_funs, block = "functions")))`

and a `custom_family()`

following certain helpful examples (thanks @paul.buerkner and @andrewheiss !):

- Custom Families in brms Models — custom_family • brms
- Custom gaussian hurdle family not quite working in brms
- GitHub - paul-buerkner/custom-brms-families: R scripts for custom brms families

… but I don’t know how to write a correct Stan density function. And certainly not an optimal one!

Wondering if anyone has tried something similar in the past?

Pseudocode something like:

```
humanize_count <- function (x) {
preferred <- c(1:8, 10, 12, 15, 20, 25, 30)
nearest <- function (x) which.min(abs(preferred - x))
preferred[sapply(x, nearest)]
}
```

… although the range of my data is small enough that it could be brute-forced with a half-dozen (see?) `if-else`

cases.

## Figure

For the data in the figure, the two questions being asked are:

- On a typical winter day, how often do you burn wood?
- On a typical day that you burn wood, how many logs do you burn?

This is of the subset who have already responded that they have a fireplace and use it. Hence, there are no zeros in the histograms. There may be some one-inflation, but since it’s truncated below 1, I think that can be handled by just subtracting one and … presto, there’s zero-inflation!

P.S. If you want to suggest a better distribution than negative binomial, I’ll owe you another one!