Is there any way for calculating vector index from data?

I try to write a custom pdf that in works like a histogram. Here is a code I have so far:


functions {
  real mydist_lpdf(real x, vector[] w, real x_lower, real x_upper) {
    int M = num_elements(w)-1;
    int idx = round((x - x_lower)/(x_upper - x_lower) * M);
    return w[idx];
  }
}

data {
  int<lower=1> N;          // number of data points
  real x_lower;            // lower data boundary   
  real<lower=x_lower> x_upper;            // upper data boundary   
  real<lower=x_lower, upper=x_upper> x[N];               // observations
  int<lower=2> Ngrid;      // number of items in the grid
}
parameters {
  simplex[Ngrid] w;              // fixed points of the interpolated distribution estimate
}

model {
  x ~ mydist(x | w);
}

The code does not compile, because round returns real, not int (because I cannot test it, there may be other, unrelated problems in my code).

The manual says:

The rounding functions cannot be used as indices to arrays because they return real values. Stan may introduce integer-valued versions of these in the future, but as of now, there is no good workaround.

Is there any, even mediocre, walkaround to my problem? Lack of such basic function is a real issue for me.

See here for why the language does not permit conversion from real to int.

But if your data truly are binned and you’re seeking to account for that, see here.. Note that the trick I propose there should work for any center-scale distribution, but for others you’ll have to use the more manual method.

1 Like

You can just precompute the indices of each element of x and pass those as integer data.

Edit: Alternatively, if you really want to, you can write a function that, for each element of x[n], increments an integer i, checks whether (i + 0.5) > x[n], and outputs i if TRUE. Such a function should never be used in the model block, but you only need it in transformed data, where it would be fine. But I think the more important point is that it’s much more natural to write the discretizing function in R/python/whatever and then pass the output as data, rather shoehorning this awkward implementation in to the Stan language.

2 Likes