Statisticians: _propto vs _unnormalized (vs ?)

seantalts · May 31, 2019, 9:33pm

[edit 6/1 - new poll at bottom!]

Dear applied statisticians,

We’re adding a language feature to the compiler to allow you to manually specify a distribution that will only be calculated proportional to a constant (i.e. it will not be normalized). Whatever feature we choose will not break any existing code. If you have a second, please answer this poll about which one you think is the optimal choice for the Stan language, taking into account new users, existing terminology in your field, etc. The example is using the normal distribution; see Bob’s post below for some more context.

Reply to this thread with other options and we can add them!
Thanks,
Sean

New poll (June 1) with new options:

adding both foo_normalized_lpdf and foo_unnormalized_lpdf as function names for any density foo
adding just foo_propto_lpdf to specify the unnormalized/proportional-to-a-constant density
adding just foo_kernel_lpdf to specify the unnormalized/proportional-to-a-constant density
adding foo_lupdf (or foo_lupmf for discrete densities) to specify the unnormalized/proportional-to-a-constant density

0 voters

(@lauren, @jonah, @bgoodri, @bbbales2, @betanalpha …)

hhau · May 31, 2019, 9:50pm

Maybe kernel? https://en.wikipedia.org/wiki/Kernel_(statistics)#Bayesian_statistics
Downside is that It’s already a heinously overloaded word.

lauren · May 31, 2019, 10:24pm

What’s the use case for this?

Bob_Carpenter · June 1, 2019, 7:23am

Thanks for asking. Let me provide the context that’s missing from the poll. Right now,

target += foo_lpdf(y | theta);

and

y ~ foo(theta);

behave differently. The functional form and hence target += includes normalizing constants (like \log \sqrt{2 \pi} in the normal lpdf). The sampling statement form (with ~) drops normalizing constants in the samplers and optimizers (technically, the algorithm gets a flag to determine whether to keep all or none of the normalizing constants and the algorithms drop them other than one case for ADVI).

We want the following two groups of statements to be equivalent:

Unnormalized / Proportional forms

y ~ foo(theta);

target += foo_unnormalized_lpdf(y | theta);
y ~ foo_unnormalized(theta);

Normalized forms

target += foo_lpdf(y | theta);

y ~ foo_normalized(theta);
target += foo_normalized_lpdf(y | theta);

I didn’t think it made sense to only add the missing case we really need expressively, which is the unnormalized (aka propto) lpdf and lpmf functions.

seantalts · June 1, 2019, 2:37pm

When you say “We want the following two groups of statements to be equivalent,” it sounds to me like you’re saying the group of unnormalized forms should be equivalent to the group of normalized forms. But I think you mean that all three lines in the Unnormalized / Proportional forms are equivalent, and separately that each of the Normalized forms are equivalent to the other two normalized forms. (right?)

betanalpha · June 1, 2019, 3:33pm

Adding either “unnormalized” or “propto” as full words is going to be ungainly. I suggest unpacking the “lpdf” suffix to motivate a cleaner notation,

lpfd
-> log posterior density function
-> log unnormalized posterior density function
-> lupdf

which has the convenient pronunciation “lup-dif”.

seantalts · June 1, 2019, 5:39pm

I like it! Keep the ideas coming, I’ll add them to the poll.

seantalts · June 1, 2019, 5:48pm

I had to create an all-new poll, which threw out the old results. unnormalized had been winning but not by much. Now there are new options from @hhau, @betanalpha, and @Bob_Carpenter

anon75146577 · June 2, 2019, 12:20am

I voted for the first one, but it should be foo_lpdf and foo_unnormalized_lpdf to keep backwards compatibility. The sampling statement should still be unnormalized.

kernel is about as bad as it could possibly be and doesn’t match any statistical or machine learning useages.
strongly dislike adding another acronym (without looking I can’t remember if it’s ulpdf or lupdf which isn’t a good sign)
this is still wrong for any truncates parameters so the compiler would have to fix that. Otherwise it’s just even more confusing.

Is there any reason why propto shouldn’t just be a global keyword? I’m struggling to think of a situation where you need this for one log density but not all of them.

sakrejda · June 2, 2019, 12:37am

You could run the model non normalized but calculate the normalized density in generated quantities for model comparison.

ahartikainen · June 2, 2019, 9:40am

For me, propto doesn’t open at first.

I think explicit is better than implicit, but simpler is better than complex.

_ulpdf
_u_lpdf
_lpdfu
_lpdf_u
_unnormalized_lpdf
_unnormalised_lpdf
_lpdf_unnormalized
_lpdf_unnormalized

(I will screw-up unnormalized vs unnormalised)

Also… normalize sounds like we make it normal. (I think this is just a language thing)

seantalts · June 2, 2019, 4:30pm

Ah yes, another great point - we’re not proposing anything backwards incompatible here, just adding additional forms to work with.

I hadn’t thought about that before - is it wrong because truncation should be using the normalized version? If there’s a discussion on that somewhere and you can find it the link would be helpful here.

It started off that way, but then we needed a way to use the normalized version selectively in a model and ended up with this ~ vs. target += distinction, which I guess we thought might be too confusing (I forget the rationale here, in case @Bob_Carpenter or @Matthijs) remembers.

anon75146577 · June 2, 2019, 5:00pm

The truncates distributions section of this 7.4 Sampling Statements | Stan Reference Manual

seantalts · June 2, 2019, 5:04pm

Thanks! I skimmed that once before asking and once after you posted and I’m still not sure what is wrong about truncation in Stan currently that is still wrong after we add unnormalized forms. There’s probably some piece of basic statistical knowledge I’m lacking, sorry about that.

Matthijs · June 2, 2019, 6:06pm

@jonah, @lauren, @andrewgelman, any thoughts?

anon75146577 · June 2, 2019, 6:08pm

It’s that it’s done manually. So foo_normalized isn’t properly normalized.

And given you’re re-writing the compiler, it’s probably good to be aware of things that have to be done routinely (the code for this never changes, so it is quite possibly automateable)

andrewgelman · June 2, 2019, 11:54pm

Hmmm . . . as a user, most of the time I’d like to use lpdf with the understanding that it’s normalized, so that if I do y ~ normal(…) it’s with the understanding that all normalizing constants are included. The cost of computing log(sqrt(2*pi)) is small compared to the peace of mind gained by knowing exactly what I’m computing.

Offhand, I can’t think of any examples where there are factors that are (a) OK to exclude from the calculation of the normalizing constant, and (b) expensive to compute.

My thinking here is as follows: if a factor in the normalizing constant depends on parameters, then we’ll need to compute it or else we’re not working with the correct posterior distribution. But if it doesn’t depend on parameters, then it depends only on data which means we only need to compute it once.

I suppose there are some settings where computing the normalizing constant only once is super-expensive, and for those problems it can make sense to compute an unnormalized density (equivalently, a log density with an unspecified additive term).

So my suggestion would be for y ~ foo() to correspond to target += foo_lpdf, and for the rare unnormalized density functions to be specially labeled, e.g. foo_unnormalized_lpdf. Yes, that’s a lot of characters to type, but I wouldn’t think it would arise so often.

anon75146577 · June 3, 2019, 3:01am

The normalizing constant for an ICAR model is the only one that comes readily to mind.

seantalts · June 3, 2019, 1:37pm

This is true, but due to the way Stan is currently architected, there’s no way to just compute those values once - we end up computing them every leapfrog iteration. I have some next-gen-style project ideas that would help with that in case any C++ folks want an interesting research-y project :) The end state would be to bring the Math library code into the new Stan compiler so that we can do whole program, sampler-aware optimization.

jonah · June 3, 2019, 3:43pm

If ends up being _unnormalized_lpdf then I think we probably want to have an alias unnormalised_lpdf.

Yeah, we would end up with ridiculously long function names like neg_binomial_2_log_unnormalized_lpmf().

I think I prefer either propto or @betanalpha’s lupdf, but I don’t have too strong of an opinion on this either way.

Topic		Replies	Views
Request for final feedback: User controlled unnormalized (propto) distribution syntax General	12	1552	June 25, 2020
Can I sample from an unnormalized target distribution by using stan? Modeling	6	1107	April 28, 2020
Half-normal priors for sigma (sd) in hierarquical analysis Modeling rstan , specification	2	84	February 20, 2025
_lupdf functions - where/how to doc them? Developers docs	24	2176	October 22, 2020
Target += vs. sampling: does dropping constants preserve sampling behavior? General	5	498	October 13, 2022

Statisticians: _propto vs _unnormalized (vs ?)

Related topics