Linear interpolation and searchsorted in stan

Hey everyone new stan user here,

I am trying to implement a linear interpolation function into stan, but i am having some trouble finding an equivalent to the function “seachsorted” from the python package numpy (Link). Does anyone know about a similar implementation in stan? Or is there an other way to do linear interpolation specific to stan?

Here is my attempt at the interpolation function.

vector interpolate(vector x0,vector y0,vector x){
    int nx = rows(x);
    vector[nx] idx = numpy.searchsorted(x0, x);
    vector[nx] dl = x - x0[idx - 1];
    vector[nx] dr = x0[idx] - x;
    vector[nx] d = dl + dr;
    vector[nx] wl = dr ./ d;
    vector[nx] interp = wl .* y0[idx - 1] + (1 - wl) * y0[idx];
    return interp 
}

Thank you for your time.

2 Likes

Hi, here’s a rough sketch I’ve been using for my own purposes. Comes with no warranty. @bbbales2 and @Bob_Carpenter can probably help shape this into usable code.


int which_min(real [] y ){
    int ans = sort_indices_asc(y)[1];
    return(ans);
  }
real linear_interpolation(real x, real[] x_pred, real[] y_pred){
    int K = size(x_pred);
    real deltas [K];
    real ans;
    int i;
    if(size(y_pred) != K) reject("x_pred and y_pred aren't of the same size");
    for(k in 1:K) deltas[k] = fabs(x_pred[k] - x);
    i = which_min(deltas);
    if(i != 1){
      real x1 = x_pred[i];
      real x2 = x_pred[i + 1];
      real y1 = y_pred[i];
      real y2 = y_pred[i + 1];
      ans = y1 + (y2-y1) * (x-x1)/(x2-x1);
    }else{
      ans = y_pred[i];
    }
    return(ans);
  }
8 Likes

Hey @maxbiostat Thanks for the answer!

I have tried to modify the code to work with a vector input instead

    int which_min(vector y ){
    int ans = sort_indices_asc(y)[1];
    return(ans);
    }
    vector interpolate(vector x_pred, vector y_pred,vector x){
    int K = rows(x_pred);
    vector[K] deltas;
    int Kx = rows(x);
    vector[Kx] ans;
    int i;
    if(rows(y_pred) != K) reject("x_pred and y_pred aren't of the same size");
    for(k in 1:K){
    deltas[k] = fabs(x_pred[k] - x[k]);
    }
    i = which_min(deltas);
    if(i != 1){
      real x1 = x_pred[i];
      real x2 = x_pred[i + 1];
      real y1 = y_pred[i];
      real y2 = y_pred[i + 1];
      ans = y1 + (y2-y1) * (x-x1)/(x2-x1);
    }else{
      ans[i] = y_pred[i];
    }
    return(ans);
  }

it seems to run without an error, but i am not entirely sure if i formulated it correctly. Does it look like it makes sense?

EDIT: Just learned about to_array_1d() i think this does what i want.

2 Likes

Since this a statistical model, why not just define it as a spline or something?

@saudiwin i have been looking into different ways of doing 1-d interpolation, so i am open to using splines. I am using the interpolation to obtain values inbetween an evenly spaced array i define and do operations on and the interpolated results are then used as input for my likelihood(s).

Ok. Here’s slightly refactored vector-based function inspired by the PR#18 in stan-docs repo. It implements 3 different weight functions, as suggested by @nhuurre . This is linear interpolation, but I also have quadratic envelope method, which is even smoother, but it sacrifices the two outer bins.

functions{
real linear_interpolation_v(real x, vector x_pred, vector y_pred){
    int K = rows(x_pred);
    vector[K] deltas = x - x_pred;
    real ans;
    real t;
    real w;
    int i;
    if(x<x_pred[1] || x>x_pred[K]) reject("x is outside of the x_pred grid!");
    if(rows(y_pred) != K) reject("x_pred and y_pred aren't of the same size");
    //this is which.min()
    i = sort_indices_asc(fabs(deltas))[1];
    if(deltas[i]<=0) i -= 1;
    ans = y_pred[i];
    real x1 = x_pred[i];
    real x2 = x_pred[i + 1];
    real y1 = y_pred[i];
    real y2 = y_pred[i + 1];
    ans = y1 + (y2-y1) * (x-x1)/(x2-x1);
    t = (x-x1)/(x2-x1);
    w = 1-t;
    //w = 1/(1 + exp(1/(1-t) - 1/t));
    //w = 1 - 3*pow(t,2) + 2*pow(t,3);
    ans = w*y1 + (1-w)*y2;
    return ans;
  }

} // end functions block

Test it with the following R code

library(tidyverse)
set.seed(42) #set the correct seed
d <- tibble(
  xs = 0:5,
  ys=sample(10:15, 6, replace = TRUE),
  type="known")

tibble(
      xs=seq(0,5, by=0.001),
      ys=sapply(xs, linear_interpolation_v, d$xs, d$ys),
      type="predicted"
    ) %>% 
  bind_rows(d) %>% 
  ggplot()+
  geom_point(aes(xs, ys, color=type))+
  labs(title="w = ...")

2 Likes