Help with vectorizing Stan program

Hi everyone, I’ve been working through the Bayesian Methods for Hackers book by Cameron Davidson-Pilon but instead of PyMC I’m writing everything in R and Stan.

Things have gone rather smoothly so far but I came across an interesting example that I’m having trouble vectorizing. I believe I have the model specified correctly without vectorization but here’s the info so far:


The original model in PyMC2 is below:

def euclidean_distance(x, y):
    return np.sqrt(((x - y) ** 2).sum(axis=1))

def f_distance(gxy_pos, halo_pos, c):
    # foo_position should be a 2-d numpy array
    return np.maximum(euclidean_distance(gxy_pos, halo_pos), c)[:, None]

def tangential_distance(glxy_position, halo_position):
    # foo_position should be a 2-d numpy array
    delta = glxy_position - halo_position
    t = (2 * np.arctan(delta[:, 1] / delta[:, 0]))[:, None]
    return np.concatenate([-np.cos(t), -np.sin(t)], axis=1)

import pymc as pm

# set the size of the halo's mass
mass_large = pm.Uniform("mass_large", 40, 180, trace=False)

# set the initial prior position of the halos, it's a 2-d Uniform dist.
halo_position = pm.Uniform("halo_position", 0, 4200, size=(1, 2))

@pm.deterministic
def mean(mass=mass_large, h_pos=halo_position, glx_pos=data[:, :2]):
    return mass / f_distance(glx_pos, h_pos, 240) *\
        tangential_distance(glx_pos, h_pos)

ellpty = pm.Normal("ellipcity", mean, 1. / 0.05, observed=True,
                   value=data[:, 2:])

mcmc = pm.MCMC([ellpty, mean, halo_position, mass_large])
map_ = pm.MAP([ellpty, mean, halo_position, mass_large])
map_.fit()
mcmc.sample(200000, 140000, 3)

Here is my un-vectorized version of the model written in Stan:

functions {
real f_distance(row_vector gxy_pos, vector halo_pos, real c) {
  return fmax(distance(gxy_pos, halo_pos), c);
}

vector tangential_distance(row_vector glxy_position, vector halo_position) {
  real xdiff = glxy_position[1] - halo_position[1];
  real ydiff = glxy_position[2] - halo_position[2];
  real t = (2 * atan(ydiff / xdiff));
  vector[2] tdist;
  tdist[1] = -cos(t);
  tdist[2] = -sin(t);
  return tdist;
}
}

data {
  int<lower=0> n;
  matrix[n, 2] cart_pos; // x,y coordinates of galaxy position
  matrix[n, 2] ellip_pos; // a measure of ellipticity
}

parameters {
  real<lower=40.0,upper=180.0> exp_mass_large;
  vector<lower=0,upper=4200.0>[2] halo_position;
}

transformed parameters {
  real mass_large = log(exp_mass_large); // one large halo
}

model {
  vector[2] mu; 
  
  for (i in 1:n) {
    mu = mass_large / f_distance(cart_pos[i,:], halo_position, 240.0) * 
      tangential_distance(cart_pos[i,:], halo_position); 
    
    ellip_pos[i,1] ~ normal(mu[1], 0.05); // x-coordinate
    ellip_pos[i,2] ~ normal(mu[2], 0.05); // y-coordinate
  }
}

The way I wrote the model above appears to sample okay. There is definitely some bi-modal behavior with the marginal distributions that I didn’t see in the book but overall the results look similar. However I have attempted (and failed) to write the above program in a vectorized way similar to the book. I’ve been stumbling with using vector data types or arrays of vectors and when to use element-wise multiplication/division (.*, ./) or even if functions like fmax can be applied element-wise.

Any pointers would be appreciated!

1 Like

The vectorization stuff gets updated quite a bit. Which interface are you using? cmdstanpy (cmdstanpy · PyPI) should have the latest stuff.

Was there a particular statement here you wanted to vectorize that was being annoying?

Thanks for the response. I’m realizing my ask was probably way too general. I’m using cmdstanr for everything. I guess I’m just curious if there were rules of thumb for vectorization. In Julia including the . before any function will perform broadcasting so when I saw the .* and ./ stuff I thought there may be something similar there. So if that were the case if I wanted to vectorize fmax I would use something like fmax.(). But I’m thinking that’s likely not the case here.

So I guess to just consolidate this down to a few questions:

  1. When should I be using those vectorized operators? Is that dependent on what data types I’m using?
  2. Is it recommended to use vectors over arrays or vice versa?
  3. Do I need to do anything special to vectorize fmax? Or how would you tell if something has a vectorized implementation?

Thanks again

Good questions

This is the bit in the manual on vectorization: 3.1 Vectorization of Real-Valued Functions | Stan Functions Reference

I think it’s any vector/array type, and if it’s binary functions any combination of those with scalars. I think the way to know a function supports vectorization is the signature has placeholder types. For instance, here the definition of log is:

R log(T x)

And I think that means T can be any vector type or array or arrays of vectors (not sure if there’s a limit, but the vectorization doc above should say if there is). I’m surprised the return type isn’t also T here though. Maybe the vec docs have something on that.

For pow it is:

R pow(T1 x, T2 y)

I think that means T1 doesn’t need to necessarily match T2.

As of now, you might not notice much of a difference. We are working on some stuff that will make vectors much faster than arrays though, so favor vectors because in the near future those will be faster in a lot of different places.

I think look for those templated signatures in the docs for the function you want. The implementations are done manually, so not everything supports vectorization.

Was there a specific thing you tried with fmax that broke?

Vectorization docs are kinda tricky, cause we can either try to list all supported signatures for every function (which gets painfully verbose cause there might be a crazy large number). The alternate is we do docs with placeholder types, and that is confusing cause it’s not immediately clear that the function you want is in the list (and you gotta go find and read the other page about vectorization – which includes a lot of information you probably don’t care about).

But if there’s some particular problem you hit then maybe that’ll point to a modification to do the docs to make this easier to answer.

Thanks @bbbales2! This is really helpful. I appreciate the info and I see that I need to spend some more time in the docs. It looks like there is a vectorized implementation of fmax actually. I’ll take this info and see what I can do.

I dont’ have time to run the model but the below compiles. The only major thing I changed is that Stan is column major order, so you want to iterate taking slices of columns and not rows.

I really wish we did the dot notation like julia I think that’s very intuitive to users

functions {
real f_distance(vector gxy_pos, vector halo_pos, real c) {
  return fmax(distance(gxy_pos, halo_pos), c);
}

vector tangential_distance(vector glxy_position, vector halo_position) {
  vector[2] delta = glxy_position - halo_position;
  real t = (2 * atan(delta[1] / delta[2]));
  return to_vector({-cos(t), -sin(t)});
}
}

data {
  int<lower=0> n;
  matrix[2, n] cart_pos; // x,y coordinates of galaxy position
  matrix[2, n] ellip_pos; // a measure of ellipticity
}

parameters {
  real<lower=40.0,upper=180.0> exp_mass_large;
  vector<lower=0,upper=4200.0>[2] halo_position;
}

transformed parameters {
  real mass_large = log(exp_mass_large); // one large halo
}

model {
  vector[2] mu; 
  
  for (i in 1:n) {
    mu = mass_large / f_distance(cart_pos[:, i], halo_position, 240.0) * 
      tangential_distance(cart_pos[:, i], halo_position); 
    ellip_pos[, i] ~ normal(mu, 0.05); // x-y coordinates
  }
}

2 Likes

Thanks Steve! I had to change this line

real t = (2 * atan(delta[1] / delta[2]));

to

real t = (2 * atan(delta[2] / delta[1]));

But then it works really well:

Given that you vectorized the user defined functions should it now be possible to remove the for loop within the model block?

Thanks

2 Likes

Np and nice!

I’m not sure but idt so. tmk Stan would need the concept of a rowwise() so we could so something like

matrix tangential_distance(matrix glxy_position, vector halo_position) {
  int N_glxy = rows(glxy_position);
  matrix[N_glxy, 2] delta = rowwise(subtraction, glxy_position, halo_position);
  vector[N_glxy] t = (2 * atan(delta[:, 1] / delta[:, 2]));
  return to_matrix(to_vector({-cos(t), -sin(t)}), N_glxy, 2);
}

but maybe I’m overthinking it

1 Like

Row/Col-wise operations in Stan is definitely something I’ve wanted before, as a rough blueprint could do something like this: https://godbolt.org/z/q1aqW5?

3 Likes

(@joshualeond sorry to jack the thread for a sec)

Yeah @andrjohns this is rad maybe we should write up a design doc for row/col wise() (and maybe the dot operator as well? That makes a lot of sense to me)

2 Likes