Hi Stan gang!
In Stan User’s Guide 22.8 Vectorization, at the bottom of the Vectorization through matrix operations section, they give the following “most efficient” code:
data {
matrix[N, K] x;
vector[N] y;
}
parameters {
vector[K] beta;
}
model {
y ~ normal(x * beta, 1);
}
Where they have a single column vector of betas, being multiplied by a matrix x, to create the row vector y (and some noise).
Suppose instead that I had a column vector of length N called “species” (converted to some numerical representation), that ranged from 1 to n_species. In this case, each row i of the matrix X is associated with a particular species (species[i]). Suppose that I want to generate different sets of betas for each species. I would create an array of beta column vectors, i.e.:
data {
int<lower = 1> n_species;
matrix[N, K] x;
vector[N] y;
vector[N] species;
}
parameters {
vector[K] beta[n_species];
}
I’m stuck on how to keep this encapsulated as a single matrix/vector multiplication. My initial thought was:
model {
y ~ normal(x * beta[,species], 1);
}
Such that the beta matrix is subset by the corresponding set of betas for species for observation i. However, this ends up just creating a new matrix with dimensions K x N, so my result is an N x N matrix, rather than an N x 1 column vector.
I could just create an N x K matrix of betas, with each row being the proper set of betas for that given species, and just do rowwise dotproduct, however this seems super inefficient. What would be the best practice to do here?
Thanks in advanced!
EDIT: I should mention too, it just so happens that my X matrix is ordered based on the species factor. That is, the first, say, 100 rows are ALL species 1, then the next 200 rows are ALL species 2, then the next 143 rows are ALL species 3, and so on. So, one other way I thought about doing this was when generating betas for species 1 for example, use the segment() function to apply this set of betas to the segment of X that are associated with species 1, and then move on to the next segment for species 2, in a similar fashion to dealing with ragged data structures. But, I get a little sketched out if I’m removing some generality like this, as it assumes that the data is going to be ordered by species. Mind you, I can’t see that assumption changing, but still…