Statisticians: _propto vs _unnormalized (vs ?)

[edit 6/1 - new poll at bottom!]

Dear applied statisticians,

We’re adding a language feature to the compiler to allow you to manually specify a distribution that will only be calculated proportional to a constant (i.e. it will not be normalized). Whatever feature we choose will not break any existing code. If you have a second, please answer this poll about which one you think is the optimal choice for the Stan language, taking into account new users, existing terminology in your field, etc. The example is using the normal distribution; see Bob’s post below for some more context.

Reply to this thread with other options and we can add them!
Thanks,
Sean

New poll (June 1) with new options:

  • adding both foo_normalized_lpdf and foo_unnormalized_lpdf as function names for any density foo
  • adding just foo_propto_lpdf to specify the unnormalized/proportional-to-a-constant density
  • adding just foo_kernel_lpdf to specify the unnormalized/proportional-to-a-constant density
  • adding foo_lupdf (or foo_lupmf for discrete densities) to specify the unnormalized/proportional-to-a-constant density

0 voters

(@lauren, @jonah, @bgoodri, @bbbales2, @betanalpha …)

1 Like

Maybe kernel? https://en.wikipedia.org/wiki/Kernel_(statistics)#Bayesian_statistics
Downside is that It’s already a heinously overloaded word.

What’s the use case for this?

Thanks for asking. Let me provide the context that’s missing from the poll. Right now,

target += foo_lpdf(y | theta);

and

y ~ foo(theta);

behave differently. The functional form and hence target += includes normalizing constants (like \log \sqrt{2 \pi} in the normal lpdf). The sampling statement form (with ~) drops normalizing constants in the samplers and optimizers (technically, the algorithm gets a flag to determine whether to keep all or none of the normalizing constants and the algorithms drop them other than one case for ADVI).

We want the following two groups of statements to be equivalent:

Unnormalized / Proportional forms

y ~ foo(theta);

target += foo_unnormalized_lpdf(y | theta);
y ~ foo_unnormalized(theta);

Normalized forms

target += foo_lpdf(y | theta);

y ~ foo_normalized(theta);
target += foo_normalized_lpdf(y | theta);

I didn’t think it made sense to only add the missing case we really need expressively, which is the unnormalized (aka propto) lpdf and lpmf functions.

When you say “We want the following two groups of statements to be equivalent,” it sounds to me like you’re saying the group of unnormalized forms should be equivalent to the group of normalized forms. But I think you mean that all three lines in the Unnormalized / Proportional forms are equivalent, and separately that each of the Normalized forms are equivalent to the other two normalized forms. (right?)

Adding either “unnormalized” or “propto” as full words is going to be ungainly. I suggest unpacking the “lpdf” suffix to motivate a cleaner notation,

lpfd
-> log posterior density function
-> log unnormalized posterior density function
-> lupdf

which has the convenient pronunciation “lup-dif”.

3 Likes

I like it! Keep the ideas coming, I’ll add them to the poll.

I had to create an all-new poll, which threw out the old results. unnormalized had been winning but not by much. Now there are new options from @hhau, @betanalpha, and @Bob_Carpenter

I voted for the first one, but it should be foo_lpdf and foo_unnormalized_lpdf to keep backwards compatibility. The sampling statement should still be unnormalized.

  • kernel is about as bad as it could possibly be and doesn’t match any statistical or machine learning useages.

  • strongly dislike adding another acronym (without looking I can’t remember if it’s ulpdf or lupdf which isn’t a good sign)

  • this is still wrong for any truncates parameters so the compiler would have to fix that. Otherwise it’s just even more confusing.

Is there any reason why propto shouldn’t just be a global keyword? I’m struggling to think of a situation where you need this for one log density but not all of them.

2 Likes

You could run the model non normalized but calculate the normalized density in generated quantities for model comparison.

For me, propto doesn’t open at first.

I think explicit is better than implicit, but simpler is better than complex.

_ulpdf
_u_lpdf
_lpdfu
_lpdf_u
_unnormalized_lpdf
_unnormalised_lpdf
_lpdf_unnormalized
_lpdf_unnormalized

(I will screw-up unnormalized vs unnormalised)

Also… normalize sounds like we make it normal. (I think this is just a language thing)

1 Like

Ah yes, another great point - we’re not proposing anything backwards incompatible here, just adding additional forms to work with.

I hadn’t thought about that before - is it wrong because truncation should be using the normalized version? If there’s a discussion on that somewhere and you can find it the link would be helpful here.

It started off that way, but then we needed a way to use the normalized version selectively in a model and ended up with this ~ vs. target += distinction, which I guess we thought might be too confusing (I forget the rationale here, in case @Bob_Carpenter or @Matthijs) remembers.

The truncates distributions section of this https://mc-stan.org/docs/2_19/reference-manual/sampling-statements-section.html

Thanks! I skimmed that once before asking and once after you posted and I’m still not sure what is wrong about truncation in Stan currently that is still wrong after we add unnormalized forms. There’s probably some piece of basic statistical knowledge I’m lacking, sorry about that.

@jonah, @lauren, @andrewgelman, any thoughts?

It’s that it’s done manually. So foo_normalized isn’t properly normalized.

And given you’re re-writing the compiler, it’s probably good to be aware of things that have to be done routinely (the code for this never changes, so it is quite possibly automateable)

Hmmm . . . as a user, most of the time I’d like to use lpdf with the understanding that it’s normalized, so that if I do y ~ normal(…) it’s with the understanding that all normalizing constants are included. The cost of computing log(sqrt(2*pi)) is small compared to the peace of mind gained by knowing exactly what I’m computing.

Offhand, I can’t think of any examples where there are factors that are (a) OK to exclude from the calculation of the normalizing constant, and (b) expensive to compute.

My thinking here is as follows: if a factor in the normalizing constant depends on parameters, then we’ll need to compute it or else we’re not working with the correct posterior distribution. But if it doesn’t depend on parameters, then it depends only on data which means we only need to compute it once.

I suppose there are some settings where computing the normalizing constant only once is super-expensive, and for those problems it can make sense to compute an unnormalized density (equivalently, a log density with an unspecified additive term).

So my suggestion would be for y ~ foo() to correspond to target += foo_lpdf, and for the rare unnormalized density functions to be specially labeled, e.g. foo_unnormalized_lpdf. Yes, that’s a lot of characters to type, but I wouldn’t think it would arise so often.

The normalizing constant for an ICAR model is the only one that comes readily to mind.

This is true, but due to the way Stan is currently architected, there’s no way to just compute those values once - we end up computing them every leapfrog iteration. I have some next-gen-style project ideas that would help with that in case any C++ folks want an interesting research-y project :) The end state would be to bring the Math library code into the new Stan compiler so that we can do whole program, sampler-aware optimization.

If ends up being _unnormalized_lpdf then I think we probably want to have an alias unnormalised_lpdf.

Yeah, we would end up with ridiculously long function names like neg_binomial_2_log_unnormalized_lpmf().

I think I prefer either propto or @betanalpha’s lupdf, but I don’t have too strong of an opinion on this either way.

1 Like