Vectors/arrays, interpolation, other things

I was writing a Stan program and some issues came up:

  1. Linear interpolation. I wanted to create a function that interpolates, something like this:
    vector[N] x0;
    vector[N] y0;
    The function f(y) interpolates:
    If x < x0[1], then y = x0[1];
    If x is between x0[n] and x0[n+1], then y interpolates between y0[n] and y0[n+1];
    If x > x0[N], then y = y0[n];

A few years ago we discussed this. I couldn’t find it in the Stan manual. It would be super-useful. Currently I’m using an ugly hack.

  1. Concatenating vectors or arrays. I wanted to do something like this: a = c(b1, b2, b3), where b1, b2, b3 are vectors or 1-d arrays of reals. Is there a Stan function that does this? Again, the way I had to do this was kind of ugly. This issue comes up when stringing together different pieces of data for a Stan function call.

  2. Vectors and arrays. This keeps coming up, that for some purposes we want a vector, for other purposes an array of reals, and I have to keep going back and forth between them using to_array_1d. I recall there being some discussion about making Stan more user-friendly? This comes up a lot!

Thanks.

1 Like

Maybe we can make a variadic version of this?

The lack of a variadic version of vector concatenation has come up very recently, although the interest there was for use in the generated quantities.

This one is really simple actually. For the simpler case with vectors of the same type I threw together a prototype that reduces the copies and a simple test. In terms of the math library this can be done pretty easy.

The tougher problems is how do you specify a variadic function in function signatures file. The add() function for signatures is not variadic either. We could add signatures for up to N parameters, or something. If someone has an idea on how to solve this on the Stan side, I would be happy to help on the Stan Math side. Variadic functions are a fun challenge to me :)

It would be great to have concatenation. The big thing, though, would be if we could treat vectors and 1-D real arrays as equivalent, and treat matrices and 2-D real arrays as equivalent. I spend lots and lots of time when coding, going back and forth between real theta[N]; and vector[N] theta;. Sometimes I need to code something as a vector (so as to do some matrix algebra), other times I need to code something as an array (so as to feed something into a function, or because I’m doing logistic regression so my data are an integer array rather than a (nonexistent) integer vector. It would be great for these to be identical so I wouldn’t keep running into errors and needing awkward hacks to go back and forth.

So basically we should make overloads of all (or majority) functions that you can do on vectors to also support real arrays and vice-versa?

I suspect the efforts of @Stevo15025 to make functions more general should help us a lot here. Can you list a few that dont work on both?

Within some limits you can code up something with Stan already now which does nice concatenation:

// takes as input an array of the form {a,b,c,d} where a,b,c,d are
// real 1d arrays. These are then returned as concatenated 1d array
real[] concatenate_real_array(real[,] elems);
real[] concatenate_real_array(real[,] elems) {
  int num_elems = size(elems);

  if (num_elems == 1)
    return(elems[1]);

  if (num_elems == 2)
    return(to_array_1d(append_row(to_vector(elems[1]), to_vector(elems[2]))));
  return(to_array_1d(append_row(to_vector(elems[1]), to_vector( concatenate_real_array(elems[2:num_elems]) ))));
}

This allows you to write

concatenate_real_array({a,b,c,d});

and it will glue things together. It’s a neat application of recursion.

Wasn’t the plan for Stan 3 to remove the distiction between reals and vectors? I thought I read something like that perhaps on the wiki.

Mcol:

That would be great, to remove the distinction between reals and vectors. Would we also remove the distinction between 2-d arrays and matrices?

Hi, here’s a very simple example of why this improvement will make a big difference in the real world.

First version of model:

data {
int<lower=0> J;
real y[J];
real<lower=0> sigma[J];
}
parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
model {
eta ~ normal(0, 1);
y ~ normal(mu + tau*eta, sigma);
}

Doesn’t compile because I can’t multiply real tau by array eta.

I changed eta to a vector[J] and it worked.

At this point you might say: Fine, I should’ve just used a vector in the first place, so why am I complaining?

My response to this (hypothetical) question is the following: Often I’ve tried to make everything into vectors, but then in other places (for example, function calls), I need reals. So I’m always going back and forth, guessing which I need to use.

Also I spend a lot of time teaching this stuff, and when I’m not teaching I’m often writing code for articles and books, and I want this code to follow good practice. But I’m not sure what “good practice” is! Define everything as an array and then convert to vectors and matrices? Define everything as a vector or matrix and then convert to arrays? Etc. This is turning into a big stumbling block. So I look forward to the day when users (including me) don’t have to worry about it.

P.S. I feel the same way about separation of declarations from distributions. When working with people, I’m always having to go back and forth between declarations and prior distributions. I’m looking forward to blockless so we can do:
real theta[J] ~ normal(0, 1);
etc.

1 Like