"quantile" function in Stan?

Marina · March 21, 2018, 9:28pm

Hi, I am writing a user-defined function in Stan. I need to calculate the percentile values of a vector. This vector is computed based on the input parameter and data.

I was not able to find such a function for that in Stan reference manual. Is there a way of doing this in Stan or is there a function that’s equivalent to “quantile” function in R?

Many thanks!

sakrejda · March 21, 2018, 11:40pm

Short answer: That function goes from continuous (parameters) to discrete transformed parameters so it’s not something that Stan encourages although you can technically create it by sorting and choosing the xth value. If you use this transformed parameter to modify the target density (via target += or any of the lpmf/lpdf functions) it will interfere with sampling.

Bob_Carpenter · March 25, 2018, 8:37pm

Is this something defined in the parameters block or as data? If it’s data, then what @sakrejda is suggesting will work and not interfere with sampling. Otherwise, it breaks continuous differentiability of the log density w.r.t. parameters.

avehtari · March 26, 2018, 11:00am

R quantile function is by default continuous although the derivative of the default type is not continuous (R quantile has 9 different algorithms for computing quantile estimate).

Marina · March 26, 2018, 5:57pm

Thanks! This vector is defined as data, and I figure that I can get the quantiles from R first and then use the quantiles as data directly in Stan.

Marina · March 26, 2018, 5:58pm

Thanks for the information

Marina · March 26, 2018, 5:59pm

Thanks sakrejda! that’s very helpful!

Bob_Carpenter · March 27, 2018, 7:16pm

Yes, if they’re just data, you can do that.

Otherwise, you need to sort and select (and maybe interpolate if you really want to go to town), as @sakrejda was outlining.

We probably should have a quantile function for data and generated quantities. It doesn’t make much sense in the model block as the sorting itself is going to act kind of like a step function (at some threshold, a different variable contributes, but that threshold’s not itself contributing to derivatives, but set by another element).

sakrejda · March 27, 2018, 7:42pm

I suppose we could include one of the well-behaved (w.r.t. sampling) continuous versions as a built-in.

karimn · September 12, 2018, 4:56pm

Hi, I’m trying to calculate quantiles in the generated quantities section so I built a function for it, but now I’m hitting the problem of the floor() function not returning integers (and as we all know there is no way to type cast in Stan for safety etc.).

Any suggestions of how I can get around this?

Thanks!

  vector quantile(vector unordered_x, vector quants) {
    int num_quants = num_elements(quants);
    int sample_size = num_elements(unordered_x);
    vector[sample_size] ordered_x = sort_asc(unordered_x);
    
    vector[num_quants] h = (sample_size - 1) * quants + 1;  
    vector[num_quants] h_floor = floor(h);
    
    return(ordered_x[h_floor] + (h - h_floor) * (ordered_x[h_floor + 1] - ordered_x[h_floor]));
  }

sakrejda · September 12, 2018, 5:16pm

You can cast to integers, you just need to build your own foot-gun based on the fact that you can compare doubles and integers:

  int cast_to_int(real x, int magnitude) {
    real precision = sqrt(machine_precision());
    int y = 0;
    real x_copy = x;
    int x_sign = 1;

    if (x < 0.0) {
      x_sign = -1;
      x_copy = -x;
    }

    for (j in 0:magnitude) {
      if (j > x_copy)
        break;
      y = j;
      if ((x_copy - j) < precision)
        break;
      if (j == magnitude) reject("Cast to int failed.")
    }
    if (x_sign < 0) {
      return -y;
    } else {
      return y;
    }
  }

karimn · September 13, 2018, 6:24pm

Thanks for providing this code!

Found another way to shoot my foot (not really fair considering I’m working with indices here). I have to specify the quantiles 0-100 and use the integer division truncation to do my work. Just wanted to share in case someone else needs something like this.

  vector quantile(vector unordered_x, int[] quants) {
    int num_quants = num_elements(quants);
    int sample_size = num_elements(unordered_x);
    vector[sample_size] ordered_x = sort_asc(unordered_x);
    
    vector[num_quants] h = (sample_size - 1) * (to_vector(quants) / 100) + 1;  
    vector[num_quants] Q;
    
    for (quant_index in 1:num_quants) {
      int h_floor = (((sample_size - 1) * quants[quant_index]) / 100) + 1;  
      
      Q[quant_index] = ordered_x[h_floor] + (h[quant_index] - h_floor) * (ordered_x[h_floor + 1] - ordered_x[h_floor]);
    }
   
    return(Q); 
  }

sakrejda · September 13, 2018, 6:47pm

Yeah, would be cool if a few of these things were easier in Stan (it’s not hard code, somebody just needs to write it…) Thanks for sharing!

Bob_Carpenter · September 14, 2018, 3:30pm

Took me a while before that one sunk in.

We should really just expose these casts and put big warnings on them. Any function doing this cast should at least be exponential in behavior.

Binary search version of to_int()

This one will work in \mathcal{O}(\log | x |) iterations to convert x to an integer.

functions {

  int ub(real x) {
    int ub = 1;
    while (ub < x) ub *= 2;
    return ub;
  }

  int closest(real x, int a, int b) {
    return fabs(x - a) < fabs(x - b) ? a : b;
  }

  // L <= x <= U
  int to_int_bsearch(real x, int L, int U);

  int to_int_bsearch(real x, int L, int U) {
    int mid = (L + U) / 2;
    if (L == U) return L;
    if (L + 1 == U) return closest(x, L, U);
    return x <= mid? to_int_bsearch(x, L, mid) : to_int_bsearch(x, mid, U);
  }

  int to_int(real x);

  int to_int(real x) {
    if (fabs(x) >= 2^31) reject("to_int arugment must be < 2^31, found x = ", x);
    if (x < 0) return -to_int(-x);
    return to_int_bsearch(x, 0, ub(x));
  }

}
transformed data {
  real cases[13] = { -13.0, -1.9, -1.3, -0.5, -1e-2, 0.0, 0.1, 0.5, 0.6, 1.0, 3.2, 3.9, 13765.23 };
  for (i in 1:size(cases))
    print("to_int(", cases[i], ") = ", to_int(cases[i]));
}
parameters {
  real<lower = 0, upper = 1> theta;
}

This prints:

to_int(-13) = -13
to_int(-1.9) = -2
to_int(-1.3) = -1
to_int(-0.5) = -1
to_int(-0.01) = 0
to_int(0) = 0
to_int(0.1) = 0
to_int(0.5) = 1
to_int(0.6) = 1
to_int(1) = 1
to_int(3.2) = 3
to_int(3.9) = 4
to_int(13765.2) = 13765

so I’m guessing it works through most of its range. A serious version would check:

max and min inputs
have error checks for out of range values

but I’ll leave those as an exercise for the reader.

Bob_Carpenter · September 14, 2018, 3:37pm

Agreed. So I opened a math issue for quantile funtions. This one’s particularly easy because there are no gradients, just some sorting or even better, using Boost’s accumulators to do it online.

maxbiostat · September 14, 2018, 9:10pm

Yes. That. Please.

sakrejda · September 15, 2018, 1:13am

Users so often want to do very wrong things (e.g. MAP) that maybe we sometimes go too far in protecting Stan from them. I guarantee you people will use this to code discrete models and then complain NUTS doesn’t work as well as Gibbs or go give an ML talk about how RandomWhatever is clearly better… sigh

Bob_Carpenter · September 20, 2018, 11:45am

We’re already supplying so many things that are dangerous and already getting so much of this stuff that a little more probably won’t hurt.

Topic		Replies	Views
Returning quantiles (or weighted quantiles) in generated quantities block Modeling rstan	3	727	September 21, 2022
Median of a vector Modeling	10	1324	July 20, 2022
Tutorial: Using R's Distribution Functions in Stan Models (external C++) General	2	189	July 31, 2024
Visualizing priors in R RStan	2	1684	January 13, 2020
How to return a value at a specific percentile from a skew-normal estimation in Stan? Modeling	4	43	February 14, 2025

"quantile" function in Stan?

Related topics