I am building a model, in which I use Gaussian processes to control for phylogenetic and spatial autocorrelation among species and sites, respectively. There are multiple observations per both species and site. Let say my data look something like this:
> dat
y_var x_var spec_id site_id
1 12.36641 12.712033 3 2
2 24.04981 10.194881 1 1
3 23.45062 13.711540 4 4
4 24.75315 9.677793 2 2
5 26.77541 13.200210 1 3
6 18.57922 11.704592 3 3
7 18.49895 13.516832 3 1
8 15.51658 10.905999 4 1
9 32.79624 13.046312 2 4
10 19.54373 12.702731 1 2
In addition, my data contain two matrices: P_mat
is N_{species} \times N_{species} matrix with phylogenetic distances between species and D_mat
is N_{sites} \times N_{sites} matrix with spatial distances between sampling sites (quadrats).
My question is how Stan matches the species in spec_id
and the sites in site_id
to the corresponding rows and columns in the respective distance matrices.
The relevant parts of my model currently look as follows:
matrix[N_spp,N_spp] SIGMA_phylo = cov_GP_phylo( P_mat , eta_phylo , rho_phylo );
matrix[N_spp,N_spp] L_SIGMA_phylo = cholesky_decompose( SIGMA_phylo );
vector[N_spp] f_phylo = L_SIGMA_phylo * z_phylo;
matrix[N_sites,N_sites] SIGMA_space = cov_GP_space( D_mat , eta_space , rho_space );
matrix[N_sites,N_sites] L_SIGMA_space = cholesky_decompose( SIGMA_space );
vector[N_sites] f_space = L_SIGMA_space * z_space;
mu = b0 + b1 * x_var + f_phylo[spec_id] + f_space[site_id]
y_var ~ normal( mu , sigma_n )
Will Stan match f_phylo
and z_phylo
to rows and columns in P_mat
in ascending order (and the same for sites)? In other words, if I order P_mat
so that species 1
is in the first row and and first column, species 2
is in the second row and second column and so on (and the same for D_mat
), will everything match, irrespective of the order in the data set with individual observations shown above?