Slow sampling for phenological model

Dear stan community,
I am working on a hierarchical phenological model. The idea is that a plant reaches a new stage in its development once a sufficient number of “development units” (DU) is accumulated (in this example around 700 DUs are needed). In the simplest form a DU is just the mean daily temperature above a temperature threshold. I have a number of stations with phenological observations (the ‘day of year’ when the new phase of plant development was reached). For each station I have a varying number of years with observations. For these stations I also have daily cumulative DUs for each each year and station. The following model runs, but very slowly.

``````data {
int<lower=1> J;                       // Number of stations
int<lower=1> K[J];                    // Number of years per station
int<lower=0> N[J, max(K)];            // Number of days observed per year per station
real DU[366, J, max(K)];              // Daily cumulative development units for each station-year
int event_day[J, max(K)];             // Observed event day for each station-year
}

parameters {
real mu_DU;
real<lower=0> sigma_DU;
real<lower=0> sigma;
vector[J] zeta_DU;
}

transformed parameters {
vector[J] DU_threshold;
for (j in 1:J) {
DU_threshold[j] = mu_DU + sigma_DU * zeta_DU[j];
}
}

model {
// Priors
mu_DU ~ normal(700, 40);
sigma_DU ~ lognormal(0, 1);
zeta_DU ~ normal(0, 1);
sigma ~ lognormal(0, 1);

// loop for each station
for (j in 1:J) {
// loop for each year at the current station
for (k in 1:K[J]) {
if (N[j, k] > 0) {
real predicted_event_day = 0;
for (i in 1:N[j, k]) {
if (DU[i,j,k] >= DU_threshold[j] && predicted_event_day == 0) {
predicted_event_day = i;
}
}
if (predicted_event_day > 0 && event_day[j, k] != -999) {
event_day[j, k] ~ normal(predicted_event_day, sigma);
}
}
}
}
}
``````

I assume the for loops are a problem here? But I am not sure how to avoid them. A second issue I have is with

``````event_day[j, k] ~ normal(predicted_event_day, sigma)
``````

Is it okay to use a normal distribution here even though event_day contains only integer values (a histogram of all event_days looks approx normal)? I am currently only using 10 stations but I have data for ~3000 stations. Running the model for 1000 iterations takes around 1h. n_eff are extremely low (between 2 and 28) and Rhat are very high.
I’d be grateful for any advice or comment, also on how to formulate a better question in case anything is unclear.

I think the posterior density is not continuous due to how `predicted_event_day` is constructed, so that when continuous `DU_threshold` changes, the `predicted_event_day` changes in steps. Stan’s HMC algorithm does not work well with discontinuous posterior densities and the warmup adaptation is now very likely making the step size small making the computation slow, and even then different chains get stuck. You would need to come up with continuous valued `predicted_event_day`, or marginalize the discrete `predicted_event_day`, or use some other software which would have non-gradient based MCMC algorithm.

1 Like

Hi Aki, thank you for taking the time to look at my problem! I changed the calculation of predicted_event_day and made it a continuous variable and fiddled around with the priors a little bit. The model now runs as fast as I expected and gives good looking parameter estimates as well as good convergence diagnostics with just 500 iterations :)

1 Like