Hi,
I just started learning Bayesian Statistics and STAN, therefore it is quite possible the following question is silly. My apologies in advance.
I am working with vegetation surveys that were conducted in several river networks. I am interested in analyzing how the number of species in common between sample sites changes as function of network distance across different river networks.
I calculated 1) the number of species in common between all pairs of points and 2) network distances between pairs of points. As an interface for STAN I am using the “rethinking” package from Richard McElreath’s book “Statistical Rethinking”.
The code below fits a linear multilevel model with varying intercepts and slopes for each river basin/network:
m1<-map2stan(alist(
species~dnorm(mu,sigma),
mu<-a_basin[basin_id]+b_basin[basin_id]*net_distance,
c(a,b)[basin_id]~dmvnorm2(c(a,b),sigma_basin,Rho),
a~dnorm(0,1),
b~dnorm(0,1),
sigma_basin~dcauchy(0,1),
Rho ~ dlkjcorr(2),
sigma~dcauchy(0,1)
),data=d,warmup=1000,iter=5000,chains=3,cores=5
)
The code works but I am not sure the model is correct. I calculated metrics for each pair of points, which means the same point shows up in several pairwise comparisons. Does this create some kind of lack of independence that has to be taken into account in the model? If so, how should I include it?
Below is the corresponding STAN code
Thank you.
data{
int<lower=1> N;
int<lower=1> N_basin_id;
real sorenson[N];
real net_dist_s[N];
int basin_id[N];
}
parameters{
vector[N_basin_id] b_dist_net;
vector[N_basin_id] a_basin;
real a;
real b;
vector<lower=0>[2] sigma_basin;
corr_matrix[2] Rho;
real<lower=0> sigma;
}
transformed parameters{
vector[2] v_a_basinb_dist_net[N_basin_id];
vector[2] Mu_ab;
cov_matrix[2] SRS_sigma_basinRho;
for ( j in 1:N_basin_id ) {
v_a_basinb_dist_net[j,1] = a_basin[j];
v_a_basinb_dist_net[j,2] = b_dist_net[j];
}
for ( j in 1:2 ) {
Mu_ab[1] = a;
Mu_ab[2] = b;
}
SRS_sigma_basinRho = quad_form_diag(Rho,sigma_basin);
}
model{
vector[N] mu;
sigma ~ cauchy( 0 , 1 );
Rho ~ lkj_corr( 2 );
sigma_basin ~ cauchy( 0 , 1 );
b ~ normal( 0 , 1 );
a ~ normal( 0 , 10000 );
v_a_basinb_dist_net ~ multi_normal( Mu_ab , SRS_sigma_basinRho );
for ( i in 1:N ) {
mu[i] = a_basin[basin_id[i]] + b_dist_net[basin_id[i]] * net_dist_s[i];
}
sorenson ~ normal( mu , sigma );
}
generated quantities{
vector[N] mu;
for ( i in 1:N ) {
mu[i] = a_basin[basin_id[i]] + b_dist_net[basin_id[i]] * net_dist_s[i];
}
}