How to vectorize when dealing with vectors of different lengths while partial pooling?

I’ve really been enjoying Stan since making the jump from JAGS. The R packages are now really mature and the Python ones are catching up quickly (thanks Arviz team!)

One thing that is still giving me issues is trying to vectorize code, particularly when I have vectors of different lengths. For example, in the model below I am trying to perform a simple univariate regression where I have data for x and y from two groups. So I have x, y and x_group vectors that are as long as the number of samples in the model (let’s call that N), and then I have J possible values for x_group, say 2 when there are groups 1 and 2. If I want my alpha and beta parameters in the model to differ by group, and I also want to partially pool these parameters, I might write something in loop form like this;

        part_code = """
    data {
        int<lower=1> J; // number of groups, 2 in this example
        int<lower=1> N; // number of samples, 20 in this example
        vector[N] y; // response variable
        vector[N] x; // predictor variable
        int x_group[N]; // the group ids for each sample
    }
    parameters {
        vector[J] beta;
        vector[J] alpha;
        real<lower=0> sigma;
        real mu_alpha;
        real mu_beta;
        real<lower=0> sigma_alpha;
        real<lower=0> sigma_beta;
    }
    transformed parameters {
        //placeholder but blank for now
    }
    model {
        vector[N] mu;
        
        //prior on sigma (error)
        sigma ~ exponential(1);
        
        //priors on alpha and beta
        mu_alpha ~ normal(0, 10);
        sigma_alpha ~ exponential(1);
        
        mu_beta ~ normal(0, 10);
        sigma_beta ~ exponential(1);
        
        //group level priors for alpha and beta
        for (j in 1:J) {
        alpha[j] ~ normal(mu_alpha, sigma_alpha);
        beta[j] ~ normal(mu_beta, sigma_beta);
        }
        
        //likelihood
        for (i in 1:N) {
        mu[i] = beta[x_group[i]] * x[i] + alpha[x_group[i]];
        y[i] ~ normal(mu[i], sigma);
        }
    }
    generated quantities {
        vector[N] log_lik;
        vector[N] y_hat;
        
        for (j in 1:N) {
            log_lik[j] = normal_lpdf(y[j] | beta[x_group[j]] * x[j] + alpha[x_group[j]], sigma);
            y_hat[j] = normal_rng(beta[x_group[j]] * x[j] + alpha[x_group[j]], sigma);
        }

    }
    """

Now I know it must be possible to pull the sampling of alpha and beta out of that loop over J while also pulling out that loop over N for the likelihood. However when I try to do that, particularly for the likelihood, I get errors saying that beta[x_group] * x won’t work because I can’t use the operator ’ * ’ between ‘ill-defined’ and x. I am fairly sure this is because beta is length J and x is length N. How would I go about vectorizing this? Note; I was able to vectorize when only fitting a group level intercept, I suspect the ’ + ’ operator is less fussy about this issue.

This is a bit of a contrived example, but I built it to better help me understand going from loops in JAGS to vectorized code in Stan. If anyone can help me I’d really appreciate it.

1 Like

The elementwise-operators are probably what you want. Check out .* and ./: 6.5 Arithmetic and Matrix Operations on Expressions | Stan Reference Manual

That’s really helpful thank you. I noticed in the docs it says;

These provide a shorthand to replace loops, but are not intrinsically more efficient than a version programmed with an elementwise calculations and assignments in a loop.

Is there a better solution to make use of vectorization here or is this optimal?

For random effects, what you’re doing is good.

Either this:

for (i in 1:N) {
  mu[i] = beta[x_group[i]] * x[i] + alpha[x_group[i]];
}
y ~ normal(mu, sigma);

or this:

mu = beta[x_group] .* x + alpha[x_group];
y ~ normal(mu, sigma);

should be about the same for now. We’re working on some stuff now that should make the second faster, but it’s a work in progress.

Ok great. Appreciate the help and all the work you folks are doing on development!

Another thing, if you have a linear term in your model (not random effects), you should probably use https://mc-stan.org/docs/2_24/functions-reference/normal-id-glm.html

So if you have something like:

matrix[N, K] X;
vector[K] gamma;
...
mu = X * gamma + beta[x_group] .* x + alpha[x_group];
y ~ normal(mu, sigma);

Do:

mu = beta[x_group] .* x + alpha[x_group];
y ~ normal_id_glm(X, mu, gamma, sigma);

instead and it’s faster.

Interesting. So say I had a random intercept only model, would it go something like…;

mu = x + alpha[x_group];
y ~ normal_id_glm(X, mu, beta, sigma);

I guess what I’m asking is should the result of all the ‘random’ parts of the model go in the mu argument of normal_id_glm?

In the documentation notation it’s alpha, but yeah the second argument is for the random effects.

But the tricks here are on the matrix * vector part of the model. If you don’t have that part, I don’t think the glm thing will be faster.

Of course, the second argument is alpha, sorry.

So the strategy is to pop all the fixed parts of the model in that third argument to make use of the vectorization?

I’m learning a lot, thank you!

Yeah to take advantage of the GLM function you have to reduce the linear part of your model to be matrix * vector + vector. That second vector are the random effects.