I have a Stan program that attempts to nonparametrically estimate extreme tail probabilities using a binomial model with conjugate beta(1, 1) priors. My current implementation is fast for K=2, but haven’t tested for higher K. Here, I only display the data
and transformed parameters
block since these are all that are needed to understand my question.
data {
int<lower = 1> K; // number of species in genus
int<lower = 1> N[K]; // number of intraspecific (within-species) genetic distances for each species
int<lower = 1> M; // number of interspecific (among-species) genetic distances for all species
int<lower = 1> C[K]; // number of combined interspecific distances for a target species and its nearest neighbour species
vector<lower = 0, upper = 1>[sum(N)] intra; // intraspecific genetic distances for each species
vector<lower = 0, upper = 1>[M] inter; // interspecific genetic distances for all species
vector<lower = 0, upper = 1>[sum(C)] comb; // interspecific genetic distances for a target species and its nearest neighbour species
}
transformed data {
int start_n[K + 1];
int start_c[K + 1];
start_n[1] = 1;
start_c[1] = 1;
for (k in 2:(K + 1)) {
start_n[k] = start_n[k - 1] + N[k - 1];
start_c[k] = start_c[k - 1] + C[k - 1];
}
real<lower = 0, upper = 1> min_inter = min(inter); // minimum interspecific genetic distance for all species
vector<lower = 0, upper = 1>[K] max_intra; // maximum intraspecific genetic distance for each species
vector<lower = 0, upper = 1>[K] min_comb; // minimum combined interspecific genetic distance for a target species and its nearest neighbour species
for (k in 1:K) {
max_intra[k] = max(segment(intra, start_n[k], N[k]));
min_comb[k] = min(segment(comb, start_c[k], C[k]));
}
int<lower = 0, upper = max(N)> y_lwr[K] = rep_array(0, K); // count of intraspecific genetic distances for each species equalling or exceeding min_inter for all species
int<lower = 0, upper = max(N)> y_lwr_prime[K] = rep_array(0, K); // count of intraspecific genetic distances for each species equalling or exceeding the minimum combined interspecific distance for a target species and its nearest neighbour species
int<lower = 0, upper = M> y_upr[K] = rep_array(0, K); // count of interspecific genetic distances for all species less than or equal to max_intra for each species
int<lower = 0, upper = max(C)> y_upr_prime[K] = rep_array(0, K); // count of combined interspecific genetic distances for a target species and its nearest neighbour species less than or equal to max_intra for each species
for (k in 1:K) {
for (n in 1:N[k]) {
y_lwr[k] += (intra[start_n[k] + n - 1] >= min_inter);
y_lwr_prime[k] += (intra[start_n[k] + n - 1] >= min_comb[k]);
}
for (m in 1:M) {
y_upr[k] += (inter[m] <= max_intra[k]);
}
for (c in 1:C[k]) {
y_upr_prime[k] += (comb[start_c[k] + c - 1] <= max_intra[k]);
}
}
}
I’ve worked through the Stan User Guide but wonder if my implementation is the most efficient in terms of speed.