Hi there,
I’m a rookie, and my background is in biology, not statistics. I’ve gotten a great deal of help from a friend writing a mark-recapture model that I’m running with the rstan interface. The response variable is probability of survival. I’m starting to add covariates, and experimenting with how various transformations improve or fail to improve model fit. I think the most relevant components of the model to my question are:
data{
int N;
int nFish; //n fish ID
int nStn;
int fish[N]; //fish ID
int stn[N];
int seen[N];
int last[N];
int n_release; // number of releases
int n_releaseCovar; //number of release covariates
int releaseID[nFish];
matrix[n_release, n_releaseCovar] releaseCovar; // release-level covariate matrix, each row corresponds to different release, each column is a release-level covariate
}
parameters{
vector<lower = 0, upper = 1>[nStn] detect;
vector[nStn] base_survival; // survival is for the reach preceding stn i
vector[n_releaseCovar] beta_release;
}
transformed parameters{
vector[nStn] full_survival[nFish]; // full Survival is indexed by fish
vector[nStn] chi[nFish];
for(i in 1:nFish){
full_survival[i] = inv_logit(base_survival + releaseCovar[releaseID[i]]*beta_release);
chi[i] = prob_uncaptured(nStn, detect, full_survival[i,2:]);
}
}
model{
// Generic priors
// target += normal_lpdf(beta_reach | 0, 1);
// target += normal_lpdf(beta_fish | 0 , 0.5); // broadened prior from 0, 0.2
target += normal_lpdf(beta_release | 0 , 10); // broadened prior from 0, 0.2
target += normal_lpdf(base_survival | 0, 0.75);
target += normal_lpdf(detect | 0.85, 0.35); // added prior for detect
// model
for (n in 1:N){
if(stn[n] != 1){
// at stn #1 p detection and survival = 1
target += bernoulli_lpmf(seen[n] | detect[stn[n]]);
target += bernoulli_lpmf(1 | full_survival[fish[n], stn[n]]);
}
if (last[n])
target += bernoulli_lpmf(1| chi[fish[n], stn[n]]);
}
The three covariates I’m including are water discharge (range = 1700-10,000), temperature (range = 10-21), and turbidity (range = 13 - 68). For the latest run of the model, I centered all three variables on 0. The prior on all three betas is target += normal_lpdf(beta_release | 0 , 10);
.
The model now samples very, very slugglishly, and I get a couple of these messages before sampling begins:
"Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity. Stan can't start sampling from this initial value."
I got even more of those messages when the prior was normal(0, 1) - when I broadened it to (0, 10), I only get one or two, but the model is still prohibitively inefficient to sample.
I’m sure there’s a well-known statistical reason that centering the predictor variables led to this, but I don’t know what it is - can someone help me out? I’m happy to post the full model code if it would help, but I was thinking it’s probably a conceptual issue and not a code issue.