Please share your Stan program and accompanying data if possible.
Dear stan community,
I am working on a hierarchical phenological model. The idea is that a plant reaches a new stage in its development once a sufficient number of “development units” (DU) is accumulated (in this example around 700 DUs are needed). In the simplest form a DU is just the mean daily temperature above a temperature threshold. I have a number of stations with phenological observations (the ‘day of year’ when the new phase of plant development was reached). For each station I have a varying number of years with observations. For these stations I also have daily cumulative DUs for each each year and station. The following model runs, but very slowly.
data {
int<lower=1> J; // Number of stations
int<lower=1> K[J]; // Number of years per station
int<lower=0> N[J, max(K)]; // Number of days observed per year per station
real DU[366, J, max(K)]; // Daily cumulative development units for each station-year
int event_day[J, max(K)]; // Observed event day for each station-year
}
parameters {
real mu_DU;
real<lower=0> sigma_DU;
real<lower=0> sigma;
vector[J] zeta_DU;
}
transformed parameters {
vector[J] DU_threshold;
for (j in 1:J) {
DU_threshold[j] = mu_DU + sigma_DU * zeta_DU[j];
}
}
model {
// Priors
mu_DU ~ normal(700, 40);
sigma_DU ~ lognormal(0, 1);
zeta_DU ~ normal(0, 1);
sigma ~ lognormal(0, 1);
// loop for each station
for (j in 1:J) {
// loop for each year at the current station
for (k in 1:K[J]) {
if (N[j, k] > 0) {
real predicted_event_day = 0;
for (i in 1:N[j, k]) {
if (DU[i,j,k] >= DU_threshold[j] && predicted_event_day == 0) {
predicted_event_day = i;
}
}
if (predicted_event_day > 0 && event_day[j, k] != -999) {
event_day[j, k] ~ normal(predicted_event_day, sigma);
}
}
}
}
}
I assume the for loops are a problem here? But I am not sure how to avoid them. A second issue I have is with
event_day[j, k] ~ normal(predicted_event_day, sigma)
Is it okay to use a normal distribution here even though event_day contains only integer values (a histogram of all event_days looks approx normal)? I am currently only using 10 stations but I have data for ~3000 stations. Running the model for 1000 iterations takes around 1h. n_eff are extremely low (between 2 and 28) and Rhat are very high.
I’d be grateful for any advice or comment, also on how to formulate a better question in case anything is unclear.