Hello everyone,
I would like to use a binary probability distribution that has a support on two values c (close to 0 but not zero, e.g. 0.1) and 1. where 1 has a probability theta and c a probability 1- theta.
As I do not see how to use available distributions, I am trying to develop a specific distribution
I tried this (you will figure out I am new to stan by now!)
functions {
real twopics_lpmf(real y, real c, vector theta) {
real themf;
themf = (y==1)? theta : (y==c)? (1-theta): 0;
return(themf);
}
}
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Parse Error. Probability mass functions require integer variates (first argument). Found type = real
ERROR at line 2
1: functions {
2: real twopics_lpmf(real y, real c, vector theta) {
^
3: real themf;
I suppose this mean y should be integer. But I actually want to y to be 0.1 and 1 …
Any clue, on how to correct this?
Best to all,
Damien
Just map 0.1 to 0, 1 to 1, and everything else to 2, make y an integer, and calculate probabilities as before. I don’t think what you’re trying to do makes sense unless there’s a very strange model in play.
1 Like
We’ve conflated pmfs with integer results. Even though you want to define what’s essentially a log pmf, you have to declare it as an _lpdf in Stan if you want to use real arguments. Also, you want to return on the log scale, so that’s as follows, with theta
declared as real (there’s no auto vectorization in Stan).
real twopics_lpdf(real y, real c, real theta) {
if (y == 1) return log(theta);
if (y == c) return log1m(theta);
reject("twopics_lpdf: illegal value for y: ", y);
}
Bob, the pmf has three cases not two…
Thank you Bob and Sakrejda,
When trying it I have the following error message
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Expecting return, found no_op statement.
Improper return in body of function.
ERROR at line 16
14: reject("twopics_lpdf: illegal value for y: ", y);
15: }
16: }
Thank you Sakrejda for your suggestion. I recognize this looks strange.
I am just trying out a model idea developed in
Gilbride, T. J., Allenby, G. M., Brazell, J. D., 2006. Models for heterogeneous variable selection, Journal of Marketing Research. 43, 420-430.
The basic idea is to detect the possibility in a choice experiment that some attributes are not attended using “almost” binomial coefficient multiplied to the beta coefficient to be evaluated. The “almost” binomial is proposed to avoid identification issues.
Actually the authors have developed their own routines. I am wondering whether this could be done with stan.
Best,
Damien
So does my code. I just used rejection rather than returning negative infinity.
If you want the original behavior, replace that with
return negative_infinity();
I forgot C++ can’t trace through functions to see that there’s always going to be an exception, so it still wants a return. So I had to write the logic to check in the function parser. To fix this, you can do it the other way around.
real twopics_lpdf(real y, real c, real theta) {
if (y != c && y != 1) reject("twopics_lpdf: illegal value for y: ", y);
if (y == 1) return log(theta);
else return log1m(theta);
}
Not so pretty, but it does make the rejection criterion clear. It’s a little less efficient to do the comparisons twice and that could be eliminated, but really won’t be worth it.
The other fix would be to put a return 0;
after the reject—it wouldn’t be reachable, but it’d stop the compiler from complaining.
We can often just code up models directly with priors to avoid identification issues. We usually don’t do variable selection because we can’t scale it given the way we have to marginalize out discrete parameters. But for a few you can do it.