I have a recurrent events data to which I want to fit a non-homogeneous Poisson process. I followed the ragged structure section in Stan tutorial. Here is a reproducible example:
functions{
real nhpp_log(vector t, real beta, real theta, real tau){
vector[num_elements(t)] loglik_part;
real loglikelihood;
for (i in 1:num_elements(t)){
loglik_part[i] = log(beta) - beta*log(theta) + (beta - 1)*log(t[i]);
}
loglikelihood = sum(loglik_part) - (tau/theta)^beta;
return loglikelihood;
}
}
data {
int<lower=0> N; //total # of obs
int<lower=0> K; //total # of groups
vector<lower=0>[K] tau;//truncated time
vector<lower=0>[N] event_time; //failure time
int s[K]; //group sizes
}
parameters{
real<lower=0> beta;
real<lower=0> theta;
}
model{
int position;
position = 1;
for (k in 1:K){
segment(event_time, position, s[k]) ~ nhpp(beta, theta, tau[k]);
position = position + s[k];
}
//PRIORS
beta ~ gamma(1, 1);
theta ~ gamma(1, 0.01);
}
When there is at least one event in each time interval, this code works perfectly fine. For example:
df0 = list(N = 38L, K = 10L,
tau = c(21.269, 18.109, 19.468, 19.89, 18.247, 19.048, 19.957, 21.006,
17.524, 19.475),
event_time = c(5.045, 14.921, 18.566, 7.265, 10.51, 12.155, 16.262, 17.738,
17.763, 16.059, 18.371, 10.393, 11.787, 5.088, 10.144, 11.646,
13.274, 15.233, 16.345, 17.583, 15.266, 15.391, 16.355, 17.79,
7.729, 13.906, 14.287, 12.012, 18.662, 5.654, 5.727, 8.144, 11.608,
14.756, 14.933, 16.088, 16.45, 18.876),
s = c(3L, 6L, 2L, 2L, 7L, 4L, 3L, 2L, 6L, 3L)
)
library(rstan)
fitplp <- stan(
model_code=plptau1, model_name="NHPP", data=df0,
iter=1000, warmup = 500, chains=1, seed = 123, refresh = 0
)
> fitplp
Inference for Stan model: NHPP2.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta 2.16 0.03 0.37 1.53 1.91 2.14 2.37 2.98 115 1.00
theta 10.53 0.12 1.40 7.72 9.59 10.58 11.51 13.16 130 1.00
lp__ -90.13 0.09 1.15 -93.08 -90.57 -89.78 -89.30 -88.98 171 1.03
Samples were drawn using NUTS(diag_e) at Mon Jul 8 10:40:19 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The true parameters for beta
and theta
are 2 and 10, which are very close to the estimates.
However, when there is no event in any of the time intervals, which is realisticly very common, the Stan code will not work. For example:
df1 = list(
N = 8L, K = 10L,
tau = c(9.9785, 7.3146, 10.0518, 10.2853, 10.2621, 8.4175, 10.7142, 12.0679, 10.6844, 8.2966),
event_time = c(6.7346, 8.1608, 4.4621, 6.5118, 7.9538, 11.2332, 11.6206, 11.9121),
s = c(0L, 0L, 2L, 1L, 2L, 0L, 0L, 3L, 0L, 0L)
)
fitplp <- stan(
model_code=plptau1, model_name="NHPP", data=df1,
iter=1000, warmup = 500, chains=1, seed = 123, refresh = 0
)
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done
I believe the problem is in the zero events in variable s
. Is there any way to get around this issue? Thank you.