"quantile" function in Stan?

Hi, I am writing a user-defined function in Stan. I need to calculate the percentile values of a vector. This vector is computed based on the input parameter and data.

I was not able to find such a function for that in Stan reference manual. Is there a way of doing this in Stan or is there a function that’s equivalent to “quantile” function in R?

Many thanks!

Short answer: That function goes from continuous (parameters) to discrete transformed parameters so it’s not something that Stan encourages although you can technically create it by sorting and choosing the xth value. If you use this transformed parameter to modify the target density (via target += or any of the lpmf/lpdf functions) it will interfere with sampling.

Is this something defined in the parameters block or as data? If it’s data, then what @sakrejda is suggesting will work and not interfere with sampling. Otherwise, it breaks continuous differentiability of the log density w.r.t. parameters.

R quantile function is by default continuous although the derivative of the default type is not continuous (R quantile has 9 different algorithms for computing quantile estimate).

Thanks! This vector is defined as data, and I figure that I can get the quantiles from R first and then use the quantiles as data directly in Stan.

Thanks for the information

Thanks sakrejda! that’s very helpful!

Yes, if they’re just data, you can do that.

Otherwise, you need to sort and select (and maybe interpolate if you really want to go to town), as @sakrejda was outlining.

We probably should have a quantile function for data and generated quantities. It doesn’t make much sense in the model block as the sorting itself is going to act kind of like a step function (at some threshold, a different variable contributes, but that threshold’s not itself contributing to derivatives, but set by another element).

I suppose we could include one of the well-behaved (w.r.t. sampling) continuous versions as a built-in.

Hi, I’m trying to calculate quantiles in the generated quantities section so I built a function for it, but now I’m hitting the problem of the floor() function not returning integers (and as we all know there is no way to type cast in Stan for safety etc.).

Any suggestions of how I can get around this?

Thanks!

  vector quantile(vector unordered_x, vector quants) {
    int num_quants = num_elements(quants);
    int sample_size = num_elements(unordered_x);
    vector[sample_size] ordered_x = sort_asc(unordered_x);
    
    vector[num_quants] h = (sample_size - 1) * quants + 1;  
    vector[num_quants] h_floor = floor(h);
    
    return(ordered_x[h_floor] + (h - h_floor) * (ordered_x[h_floor + 1] - ordered_x[h_floor]));
  }

You can cast to integers, you just need to build your own foot-gun based on the fact that you can compare doubles and integers:

  int cast_to_int(real x, int magnitude) {
    real precision = sqrt(machine_precision());
    int y = 0;
    real x_copy = x;
    int x_sign = 1;

    if (x < 0.0) {
      x_sign = -1;
      x_copy = -x;
    }

    for (j in 0:magnitude) {
      if (j > x_copy)
        break;
      y = j;
      if ((x_copy - j) < precision)
        break;
      if (j == magnitude) reject("Cast to int failed.")
    }
    if (x_sign < 0) {
      return -y;
    } else {
      return y;
    }
  }

2 Likes

Thanks for providing this code!

Found another way to shoot my foot (not really fair considering I’m working with indices here). I have to specify the quantiles 0-100 and use the integer division truncation to do my work. Just wanted to share in case someone else needs something like this.

  vector quantile(vector unordered_x, int[] quants) {
    int num_quants = num_elements(quants);
    int sample_size = num_elements(unordered_x);
    vector[sample_size] ordered_x = sort_asc(unordered_x);
    
    vector[num_quants] h = (sample_size - 1) * (to_vector(quants) / 100) + 1;  
    vector[num_quants] Q;
    
    for (quant_index in 1:num_quants) {
      int h_floor = (((sample_size - 1) * quants[quant_index]) / 100) + 1;  
      
      Q[quant_index] = ordered_x[h_floor] + (h[quant_index] - h_floor) * (ordered_x[h_floor + 1] - ordered_x[h_floor]);
    }
   
    return(Q); 
  }
1 Like

Yeah, would be cool if a few of these things were easier in Stan (it’s not hard code, somebody just needs to write it…) Thanks for sharing!

Took me a while before that one sunk in.

We should really just expose these casts and put big warnings on them. Any function doing this cast should at least be exponential in behavior.

Binary search version of to_int()

This one will work in \mathcal{O}(\log | x |) iterations to convert x to an integer.

functions {

  int ub(real x) {
    int ub = 1;
    while (ub < x) ub *= 2;
    return ub;
  }

  int closest(real x, int a, int b) {
    return fabs(x - a) < fabs(x - b) ? a : b;
  }

  // L <= x <= U
  int to_int_bsearch(real x, int L, int U);

  int to_int_bsearch(real x, int L, int U) {
    int mid = (L + U) / 2;
    if (L == U) return L;
    if (L + 1 == U) return closest(x, L, U);
    return x <= mid? to_int_bsearch(x, L, mid) : to_int_bsearch(x, mid, U);
  }

  int to_int(real x);

  int to_int(real x) {
    if (fabs(x) >= 2^31) reject("to_int arugment must be < 2^31, found x = ", x);
    if (x < 0) return -to_int(-x);
    return to_int_bsearch(x, 0, ub(x));
  }

}
transformed data {
  real cases[13] = { -13.0, -1.9, -1.3, -0.5, -1e-2, 0.0, 0.1, 0.5, 0.6, 1.0, 3.2, 3.9, 13765.23 };
  for (i in 1:size(cases))
    print("to_int(", cases[i], ") = ", to_int(cases[i]));
}
parameters {
  real<lower = 0, upper = 1> theta;
}

This prints:

to_int(-13) = -13
to_int(-1.9) = -2
to_int(-1.3) = -1
to_int(-0.5) = -1
to_int(-0.01) = 0
to_int(0) = 0
to_int(0.1) = 0
to_int(0.5) = 1
to_int(0.6) = 1
to_int(1) = 1
to_int(3.2) = 3
to_int(3.9) = 4
to_int(13765.2) = 13765

so I’m guessing it works through most of its range. A serious version would check:

  • max and min inputs
  • have error checks for out of range values

but I’ll leave those as an exercise for the reader.

2 Likes

Agreed. So I opened a math issue for quantile funtions. This one’s particularly easy because there are no gradients, just some sorting or even better, using Boost’s accumulators to do it online.

2 Likes

Yes. That. Please.

1 Like

Users so often want to do very wrong things (e.g. MAP) that maybe we sometimes go too far in protecting Stan from them. I guarantee you people will use this to code discrete models and then complain NUTS doesn’t work as well as Gibbs or go give an ML talk about how RandomWhatever is clearly better… sigh

We’re already supplying so many things that are dangerous and already getting so much of this stuff that a little more probably won’t hurt.

1 Like