Slow sampling for phenological model

Please share your Stan program and accompanying data if possible.


Dear stan community,
I am working on a hierarchical phenological model. The idea is that a plant reaches a new stage in its development once a sufficient number of “development units” (DU) is accumulated (in this example around 700 DUs are needed). In the simplest form a DU is just the mean daily temperature above a temperature threshold. I have a number of stations with phenological observations (the ‘day of year’ when the new phase of plant development was reached). For each station I have a varying number of years with observations. For these stations I also have daily cumulative DUs for each each year and station. The following model runs, but very slowly.

data {
  int<lower=1> J;                       // Number of stations
  int<lower=1> K[J];                    // Number of years per station
  int<lower=0> N[J, max(K)];            // Number of days observed per year per station
  real DU[366, J, max(K)];              // Daily cumulative development units for each station-year
  int event_day[J, max(K)];             // Observed event day for each station-year
}

parameters {
  real mu_DU;                          
  real<lower=0> sigma_DU;               
  real<lower=0> sigma;                  
  vector[J] zeta_DU; 			
}

transformed parameters {
  vector[J] DU_threshold;
  for (j in 1:J) {
    DU_threshold[j] = mu_DU + sigma_DU * zeta_DU[j];	
  }
}

model { 
  // Priors 
  mu_DU ~ normal(700, 40);        
  sigma_DU ~ lognormal(0, 1);      
  zeta_DU ~ normal(0, 1); 	
  sigma ~ lognormal(0, 1);    

  // loop for each station
  for (j in 1:J) {
    // loop for each year at the current station
    for (k in 1:K[J]) { 
      if (N[j, k] > 0) {
        real predicted_event_day = 0;
        for (i in 1:N[j, k]) {
          if (DU[i,j,k] >= DU_threshold[j] && predicted_event_day == 0) {
            predicted_event_day = i;
          }
        }
        if (predicted_event_day > 0 && event_day[j, k] != -999) {
          event_day[j, k] ~ normal(predicted_event_day, sigma);
        }
      }
    }
  }
}

I assume the for loops are a problem here? But I am not sure how to avoid them. A second issue I have is with

event_day[j, k] ~ normal(predicted_event_day, sigma)

Is it okay to use a normal distribution here even though event_day contains only integer values (a histogram of all event_days looks approx normal)? I am currently only using 10 stations but I have data for ~3000 stations. Running the model for 1000 iterations takes around 1h. n_eff are extremely low (between 2 and 28) and Rhat are very high.
I’d be grateful for any advice or comment, also on how to formulate a better question in case anything is unclear.

I think the posterior density is not continuous due to how predicted_event_day is constructed, so that when continuous DU_threshold changes, the predicted_event_day changes in steps. Stan’s HMC algorithm does not work well with discontinuous posterior densities and the warmup adaptation is now very likely making the step size small making the computation slow, and even then different chains get stuck. You would need to come up with continuous valued predicted_event_day, or marginalize the discrete predicted_event_day, or use some other software which would have non-gradient based MCMC algorithm.

1 Like

Hi Aki, thank you for taking the time to look at my problem! I changed the calculation of predicted_event_day and made it a continuous variable and fiddled around with the priors a little bit. The model now runs as fast as I expected and gives good looking parameter estimates as well as good convergence diagnostics with just 500 iterations :)

1 Like