We’re adding a language feature to the compiler to allow you to manually specify a distribution that will only be calculated proportional to a constant (i.e. it will not be normalized). Whatever feature we choose will not break any existing code. If you have a second, please answer this poll about which one you think is the optimal choice for the Stan language, taking into account new users, existing terminology in your field, etc. The example is using the normal distribution; see Bob’s post below for some more context.
Reply to this thread with other options and we can add them!
Thanks,
Sean
New poll (June 1) with new options:
adding both foo_normalized_lpdf and foo_unnormalized_lpdf as function names for any density foo
adding just foo_propto_lpdf to specify the unnormalized/proportional-to-a-constant density
adding just foo_kernel_lpdf to specify the unnormalized/proportional-to-a-constant density
adding foo_lupdf (or foo_lupmf for discrete densities) to specify the unnormalized/proportional-to-a-constant density
Thanks for asking. Let me provide the context that’s missing from the poll. Right now,
target += foo_lpdf(y | theta);
and
y ~ foo(theta);
behave differently. The functional form and hence target += includes normalizing constants (like \log \sqrt{2 \pi} in the normal lpdf). The sampling statement form (with ~) drops normalizing constants in the samplers and optimizers (technically, the algorithm gets a flag to determine whether to keep all or none of the normalizing constants and the algorithms drop them other than one case for ADVI).
We want the following two groups of statements to be equivalent:
Unnormalized / Proportional forms
y ~ foo(theta);
target += foo_unnormalized_lpdf(y | theta);
y ~ foo_unnormalized(theta);
When you say “We want the following two groups of statements to be equivalent,” it sounds to me like you’re saying the group of unnormalized forms should be equivalent to the group of normalized forms. But I think you mean that all three lines in the Unnormalized / Proportional forms are equivalent, and separately that each of the Normalized forms are equivalent to the other two normalized forms. (right?)
I had to create an all-new poll, which threw out the old results. unnormalized had been winning but not by much. Now there are new options from @hhau, @betanalpha, and @Bob_Carpenter
I voted for the first one, but it should be foo_lpdf and foo_unnormalized_lpdf to keep backwards compatibility. The sampling statement should still be unnormalized.
kernel is about as bad as it could possibly be and doesn’t match any statistical or machine learning useages.
strongly dislike adding another acronym (without looking I can’t remember if it’s ulpdf or lupdf which isn’t a good sign)
this is still wrong for any truncates parameters so the compiler would have to fix that. Otherwise it’s just even more confusing.
Is there any reason why propto shouldn’t just be a global keyword? I’m struggling to think of a situation where you need this for one log density but not all of them.
Ah yes, another great point - we’re not proposing anything backwards incompatible here, just adding additional forms to work with.
I hadn’t thought about that before - is it wrong because truncation should be using the normalized version? If there’s a discussion on that somewhere and you can find it the link would be helpful here.
It started off that way, but then we needed a way to use the normalized version selectively in a model and ended up with this ~ vs. target += distinction, which I guess we thought might be too confusing (I forget the rationale here, in case @Bob_Carpenter or @Matthijs) remembers.
Thanks! I skimmed that once before asking and once after you posted and I’m still not sure what is wrong about truncation in Stan currently that is still wrong after we add unnormalized forms. There’s probably some piece of basic statistical knowledge I’m lacking, sorry about that.
It’s that it’s done manually. So foo_normalized isn’t properly normalized.
And given you’re re-writing the compiler, it’s probably good to be aware of things that have to be done routinely (the code for this never changes, so it is quite possibly automateable)
Hmmm . . . as a user, most of the time I’d like to use lpdf with the understanding that it’s normalized, so that if I do y ~ normal(…) it’s with the understanding that all normalizing constants are included. The cost of computing log(sqrt(2*pi)) is small compared to the peace of mind gained by knowing exactly what I’m computing.
Offhand, I can’t think of any examples where there are factors that are (a) OK to exclude from the calculation of the normalizing constant, and (b) expensive to compute.
My thinking here is as follows: if a factor in the normalizing constant depends on parameters, then we’ll need to compute it or else we’re not working with the correct posterior distribution. But if it doesn’t depend on parameters, then it depends only on data which means we only need to compute it once.
I suppose there are some settings where computing the normalizing constant only once is super-expensive, and for those problems it can make sense to compute an unnormalized density (equivalently, a log density with an unspecified additive term).
So my suggestion would be for y ~ foo() to correspond to target += foo_lpdf, and for the rare unnormalized density functions to be specially labeled, e.g. foo_unnormalized_lpdf. Yes, that’s a lot of characters to type, but I wouldn’t think it would arise so often.
This is true, but due to the way Stan is currently architected, there’s no way to just compute those values once - we end up computing them every leapfrog iteration. I have some next-gen-style project ideas that would help with that in case any C++ folks want an interesting research-y project :) The end state would be to bring the Math library code into the new Stan compiler so that we can do whole program, sampler-aware optimization.