Hi,
I am relatively new to Stan. I have written a Stan program to be called from R. The program works and does what expected. However, it is very slow and I am looking for ways to improve it. An obvious one is to vectorize two loops, which I believe are the main bottleneck.
The main parts of the program is to fit a bell-shaped curve (response
), calculate a quantity known as the loglam
, which is then used to calculate the probability of observing or not a species. There are two sets of parameters, one for response (demographic model) and one for the logistic regression (observation model). I need to allow the bell-shaped curve to be asymmetric, which I achieve by introducing two parameters (sigl
and sigr
) to control the shape of the curve to the left and to the right of 0
. To allow this, however, I had to introduce an if/else statement to use the correct sigl
or sigr
for that specific data point (see // HOW TO HANDLE THIS?
in the Stan code below).
data {
int N; // observations, i.e. number of locations
int M; // length of time series
array[N] int<lower=0,upper=1> occ; // observed = 1 or not-observed = 0
array[N, M] real ts; // time series
}
parameters {
// demographic model
real mu;
real<lower=0> sigl;
real<lower=0> sigr;
// observation model
real c;
real<lower=0,upper=1> pd;
}
model{
// auxiliary variables
vector[M] response;
vector[N] loglam;
real u;
real v;
// priors - all weakly informative
mu ~ normal(0, 10);
sigl ~ exponential(1);
sigr ~ exponential(1);
c ~ normal(0, 10);
pd ~ uniform(0, 1);
// TWO INEFFICIENT NESTED LOOPS
// likelihood
for(i in 1:N){ // loop over locations
for(j in 1:M){ // loop over time
u = ts[i, j] - mu;
if (u < 0) { // HOW TO HANDLE THIS?
v = ( u / sigl )^2;
} else {
v = ( u / sigr )^2;
}
response[j] = -0.5 * v;
}
loglam[i] = mean(response);
}
occ ~ bernoulli(pd * inv_logit(loglam - c));
}
// I want to use `loo()` for model selection
generated quantities{
vector[N] log_lik;{
// auxiliary variables
vector[M] response;
vector[N] loglam;
real u;
real v;
// likelihood
for(i in 1:N){ // loop over locations
for(j in 1:M){ // loop over time
u = ts[i, j] - mu;
if (u < 0) {
v = ( u / sigl )^2;
} else {
v = ( u / sigr )^2;
}
response[j] = -0.5 * v;
}
loglam[i] = mean(response);
log_lik[i] = bernoulli_lpmf(occ[i] | pd * inv_logit(loglam[i] - c));
}
}
}
I need some help understanding how to achieve vectorization of the above code. In particular, how can I vectorize this code with an if/else in the innermost loop? Note that I also generate the log_lik
(generated quantities
) because I want to perform model selection using loo()
, making the inefficient loops appear twice. From my understanding, this is how to do it; is there a faster, better way?