Hi, I am writing a user-defined function in Stan. I need to calculate the percentile values of a vector. This vector is computed based on the input parameter and data.
I was not able to find such a function for that in Stan reference manual. Is there a way of doing this in Stan or is there a function that’s equivalent to “quantile” function in R?
Short answer: That function goes from continuous (parameters) to discrete transformed parameters so it’s not something that Stan encourages although you can technically create it by sorting and choosing the xth value. If you use this transformed parameter to modify the target density (via target += or any of the lpmf/lpdf functions) it will interfere with sampling.
Is this something defined in the parameters block or as data? If it’s data, then what @sakrejda is suggesting will work and not interfere with sampling. Otherwise, it breaks continuous differentiability of the log density w.r.t. parameters.
R quantile function is by default continuous although the derivative of the default type is not continuous (R quantile has 9 different algorithms for computing quantile estimate).
Otherwise, you need to sort and select (and maybe interpolate if you really want to go to town), as @sakrejda was outlining.
We probably should have a quantile function for data and generated quantities. It doesn’t make much sense in the model block as the sorting itself is going to act kind of like a step function (at some threshold, a different variable contributes, but that threshold’s not itself contributing to derivatives, but set by another element).
Hi, I’m trying to calculate quantiles in the generated quantities section so I built a function for it, but now I’m hitting the problem of the floor() function not returning integers (and as we all know there is no way to type cast in Stan for safety etc.).
You can cast to integers, you just need to build your own foot-gun based on the fact that you can compare doubles and integers:
int cast_to_int(real x, int magnitude) {
real precision = sqrt(machine_precision());
int y = 0;
real x_copy = x;
int x_sign = 1;
if (x < 0.0) {
x_sign = -1;
x_copy = -x;
}
for (j in 0:magnitude) {
if (j > x_copy)
break;
y = j;
if ((x_copy - j) < precision)
break;
if (j == magnitude) reject("Cast to int failed.")
}
if (x_sign < 0) {
return -y;
} else {
return y;
}
}
Found another way to shoot my foot (not really fair considering I’m working with indices here). I have to specify the quantiles 0-100 and use the integer division truncation to do my work. Just wanted to share in case someone else needs something like this.
We should really just expose these casts and put big warnings on them. Any function doing this cast should at least be exponential in behavior.
Binary search version of to_int()
This one will work in \mathcal{O}(\log | x |) iterations to convert x to an integer.
functions {
int ub(real x) {
int ub = 1;
while (ub < x) ub *= 2;
return ub;
}
int closest(real x, int a, int b) {
return fabs(x - a) < fabs(x - b) ? a : b;
}
// L <= x <= U
int to_int_bsearch(real x, int L, int U);
int to_int_bsearch(real x, int L, int U) {
int mid = (L + U) / 2;
if (L == U) return L;
if (L + 1 == U) return closest(x, L, U);
return x <= mid? to_int_bsearch(x, L, mid) : to_int_bsearch(x, mid, U);
}
int to_int(real x);
int to_int(real x) {
if (fabs(x) >= 2^31) reject("to_int arugment must be < 2^31, found x = ", x);
if (x < 0) return -to_int(-x);
return to_int_bsearch(x, 0, ub(x));
}
}
transformed data {
real cases[13] = { -13.0, -1.9, -1.3, -0.5, -1e-2, 0.0, 0.1, 0.5, 0.6, 1.0, 3.2, 3.9, 13765.23 };
for (i in 1:size(cases))
print("to_int(", cases[i], ") = ", to_int(cases[i]));
}
parameters {
real<lower = 0, upper = 1> theta;
}
Agreed. So I opened a math issue for quantile funtions. This one’s particularly easy because there are no gradients, just some sorting or even better, using Boost’s accumulators to do it online.
Users so often want to do very wrong things (e.g. MAP) that maybe we sometimes go too far in protecting Stan from them. I guarantee you people will use this to code discrete models and then complain NUTS doesn’t work as well as Gibbs or go give an ML talk about how RandomWhatever is clearly better… sigh