Thanks @cmcd. I’m going to try moving some of the transformed parameters in the model block. The discussion on the link you shared suggested that writing to disc may vary between the blocks and there are many data involved here so perhaps those disc-write limits are important. I also noticed that post suggested using the std_normal()
distribution instead of the normal(0,1)
. I’ll try that too.
The soft sum to zero constraints is pretty new for me too. I discovered it when working with an iCAR model in Stan. There’s some discussion of different approaches to these constraints here:
and the modelling work where I first discovered them is here:
https://mc-stan.org/users/documentation/case-studies/icar_stan.html
I have tried versions of this same model without the sum to zero constraints (as you suggested, they’re easy to comment-out). There’s little difference in run-time and the convergence is generally a bit worse (particularly for the main slope and intercept hyperparameters).
Update: I’ve tried these two suggestions
and I’ve used the std_normal()
distribution instead of the normal(0,1)
.
It seems to have improved things a bit:
The warmup seems to be going faster (mostly)…
get_elapsed_time(slope_stanfit)/3600 ## in hours
warmup sample
chain:1 5.227583 2.122542
chain:2 7.342417 2.112506
chain:3 5.522083 2.103778
Here’s the same result from the original parameterization:
warmup sample
chain:1 7.798528 2.105014
chain:2 8.093528 2.224678
chain:3 7.706306 2.151717
It’s not the kind of speed I’m hoping for yet, so I’m going to keep trying things. I’m wondering if it’s worth digging deep in to the error distribution on the counts. The sdnoise
parameter is one of the least well sampled parameters. Perhaps that’s causing some miss-specification.
So, here’s the model now:
// This is a Stan implementation of the bbsBayes slope model
data {
int<lower=1> nstrata;
int<lower=1> ncounts;
int<lower=1> nyears;
int<lower=0> count[ncounts]; // count observations
int<lower=1> strat[ncounts]; // strata indicators
int<lower=1> year[ncounts]; // year index
int<lower=1> fixedyear; // centering value for years
int<lower=0> firstyr[ncounts]; // first year index
int<lower=1> obser[ncounts]; // observer indicators
int<lower=1> nobservers;
}
parameters {
vector[ncounts] noise_raw; //non centered observation-level random effect to model over-dispersion
real lambda[ncounts]; // Poisson means
vector[nstrata] beta_p; //non centered random effect slopes
real BETA; //Slope hyperparameter
vector[nstrata] strata_p; //non centered random effect intercepts
real STRATA; //intercept hypterparameter
real eta; //first-year observer intercept (observer startup effect)
matrix[nstrata,nyears] yeareffect_raw; //random year-effects modeling departures from slope-based trajectory
vector[nobservers] obs_raw; // sd of year effects
real<lower=0> sdnoise; // sd of over-dispersion
//real<lower=1> nu; //optional heavy-tail df for t-distribution
real<lower=0> sdobs; // sd of observer effects
real<lower=0> sdbeta; // sd of slopes
real<lower=0> sdstrata; // sd of intercepts
real<lower=0> sdyear[nstrata]; // sd of year effects
}
model {
vector[ncounts] E; // log_scale additive likelihood
vector[ncounts] noise; // extra-Poisson log-normal variance
vector[nstrata] beta;
vector[nstrata] strata;
vector[nobservers] obs; //observer effects
matrix[nstrata,nyears] yeareffect;
sdnoise ~ std_normal(); //prior on scale of extra Poisson log-normal variance
noise_raw ~ std_normal(); //non centered prior normal tailed extra Poisson log-normal variance
sdobs ~ std_normal(); //prior on sd of observer-route effects
sdyear ~ std_normal(); // prior on sd of yeareffects - stratum specific
obs_raw ~ std_normal(); //non centered prior on observer effects
//nu ~ gamma(2,0.1); // alternate prior on df for t-distribution of heavy tailed
for(s in 1:nstrata){
yeareffect_raw[s,] ~ std_normal();
sum(yeareffect_raw[s,]) ~ normal(0,0.001*nyears);// soft sum to zero constraint
}
BETA ~ normal(0,0.1);// prior on fixed effect mean slope - hyperparameter
STRATA ~ std_normal();// prior on fixed effect mean intercept - hyperparameter
eta ~ std_normal();// prior on first-year observer effect
sdstrata ~ std_normal(); //prior on sd of intercept variation
sdbeta ~ normal(0,0.1); //prior on sd of slope variation
beta_p ~ std_normal(); //non centered prior on stratum-level slopes
strata_p ~ std_normal(); //non centered prior on stratum-level slopes
//sum to zero constraints
sum(strata_p) ~ normal(0,0.001*nstrata);
sum(beta_p) ~ normal(0,0.001*nstrata);
obs = sdobs*obs_raw;
noise = sdnoise*noise_raw;
// covariate effect on intercepts and slopes
beta = (sdbeta*beta_p) + BETA;
strata = (sdstrata*strata_p) + STRATA;
for(s in 1:nstrata){
yeareffect[s,] = sdyear[s]*yeareffect_raw[s,];
}
for(i in 1:ncounts){
E[i] = beta[strat[i]] * (year[i]-fixedyear) + strata[strat[i]] + yeareffect[strat[i],year[i]] + obs[obser[i]] + eta*firstyr[i] + noise[i];
}
count ~ poisson_log(E); //vectorized count likelihood with log-transformation
}