I get persistent:
Log probability evaluates to log(0), i.e. negative infinity
breakdowns which I cannot for the life of me (or at least the last three days of me) debug.
A simple version of the question is: If I print out the log likelihoods for all of the sampling statements in my model block, and they are all real, how can the target
be -infinity?
In more detail, there are three normal sampling statements in my hierarchical model, that look like:
mu1[i] ~ multi_normal(mu1_mean, mu1_Sigma);
x_res[i] ~ normal(phi_ar^dday[i] * x_res[i-1],
sigma_ar * sqrt(getphilag(phi_ar, dday[i])));
y_res[i] ~ normal(phi_ar^dday[i] * y_res[i-1],
sigma_ar * sqrt(getphilag(phi_ar, dday[i])));
The mu1
is the hierarchical means of locations for k
individuals, the x_res
and y_res
are individual “residuals” at a lower hierarchical level. All the location/scale parameters are nice suitable real numbers, which I know because I print everything in debugging, including the log-probabilities, i.e.:
print("multi_normal lpdf: ", multi_normal_lpdf(mu1 | mu1_mean, mu1_Sigma))
print("normal x lpdf: ", normal_lpdf(x_res[i] | phi_ar^dday[i] * x_res[i-1], sigma_ar * sqrt(getphilag(phi_ar, dday[i]))));
print("normal y lpdf: ", normal_lpdf(y_res[i] | phi_ar^dday[i] * y_res[i-1], sigma_ar * sqrt(getphilag(phi_ar, dday[i]))));
And only ever get nice numbers. But when I print the target, it is (nearly always) -inf
.
What the heck!? What else can I do!?
Also I get this: Initialization between (-2, 2) failed after 100 attempts
, even though ALL of my parameters are constrained and initialized correctly - as far as I can tell.
Any insights and help appreciated!
Complete code
The complete stan code and R code to replicate is below (this is - in fact - a much simplified version of the model that I am actually interested in). Here, k individuals are tracked for j_k locations (each tracked for different amounts of time) move with individual Ornstein-Uhlenbeck (continuous-time autocorrelated Gaussian) processes around mean locations \{X_k, Y_k\} \sim {\cal BivarN}(\mu, \Sigma). In the simulated example, there are 6 individuals, 5 locations each, 0 auto-correlation and some slight ellipticity to the range of centroids. Occasionally (i.e. for some resimulations), it works just fine. But mostly it doesn’t.
Stan code
functions {
// this function computes the adjustment to the variance
// as a dependence of the intrinsic time scale and the
// interval between locations
real getphilag(real phi, real lag) {
real philag;
int i;
philag = 0;
i = 0;
while (i <= lag) {
philag = philag + pow(phi, lag * 2.0);
i = i + 1;
}
return philag;
}
}
data {
int<lower=0> n; // number of total observations
int k; // n. of individuals
real y[n]; // x-coordinate
real x[n]; // y-coordinate
real yday[n]; // day of year
real dday[n]; // yday[i] - yday[i-1] for each individual
int<lower=1,upper=k> id[n]; // ID
int<lower=0,upper=1> printme;
}
parameters {
//vector<lower = min(x), upper = max(x)>[2] mux_mean;
//vector<lower = min(y), upper = max(y)>[2] muy_mean;
real<lower = min(x), upper = max(x)> mux_mean;
real<lower = min(x), upper = max(x)> muy_mean;
real<lower = min(x), upper = max(x)> mux1[k];
real<lower = min(y), upper = max(y)> muy1[k];
real<lower = 0> mux1_sigma;
real<lower = 0> muy1_sigma;
real<lower=-1, upper=1> rho1;
real<lower = 0, upper = 100> sigma_ar;
real<lower = 0, upper = 100> tau;
}
transformed parameters {
cov_matrix[2] mu1_Sigma;
real phi_ar;
real sigma2;
real A;
mu1_Sigma[1,1] = mux1_sigma^2;
mu1_Sigma[2,2] = muy1_sigma^2;
mu1_Sigma[1,2] = mux1_sigma * muy1_sigma * rho1;
mu1_Sigma[2,1] = mux1_sigma * muy1_sigma * rho1;
phi_ar = exp(-1.0/tau);
sigma2 = 2.0 * sigma_ar^2 / (tau * (1 - exp(-2.0/tau))^2);
A = sigma2 * (-2.0*log(0.05)*pi());
}
model {
// ancillary variables
vector[n] x_hat;
vector[n] y_hat;
vector[n] x_res;
vector[n] y_res;
vector[2] mu1_mean;
vector[2] mu1[k]; // mean locations of each individual
//sigma_ar ~ uniform(0.0, 20.0); //exponential(1.0/(max(x) - min(x)));
//tau ~ lognormal(7.0, 1);
if(printme == 1){
print("sigma_ar: ", sigma_ar);
print("phi_ar: ", phi_ar);
}
mu1_mean[1] = mux_mean;
mu1_mean[2] = muy_mean;
for (i in 1:k){
mu1[i][1] = mux1[i];
mu1[i][2] = muy1[i];
mu1[i] ~ multi_normal(mu1_mean, mu1_Sigma);
}
// hierarchical parameters
if(printme == 1){
print("mu1: ", mu1);
print("mu1_Sigma: ", mu1_Sigma);
print("multi_normal lpdf: ", multi_normal_lpdf(mu1 | mu1_mean, mu1_Sigma));
}
x_res[1] = 0;
y_res[1] = 0;
for(i in 2:n){
x_hat[i] = mu1[id[i]][1];
y_hat[i] = mu1[id[i]][2];
if(id[i] == id[i-1]){
x_res[i] = x[i] - x_hat[i];
y_res[i] = y[i] - y_hat[i];
x_res[i] ~ normal(phi_ar^dday[i] * x_res[i-1],
sigma_ar * sqrt(getphilag(phi_ar, dday[i])));
y_res[i] ~ normal(phi_ar^dday[i] * y_res[i-1],
sigma_ar * sqrt(getphilag(phi_ar, dday[i])));
if(printme == 1){
print("normal x lpdf: ", normal_lpdf(x_res[i] | phi_ar^dday[i] * x_res[i-1], sigma_ar * sqrt(getphilag(phi_ar, dday[i]))));
print("normal y lpdf: ", normal_lpdf(y_res[i] | phi_ar^dday[i] * y_res[i-1], sigma_ar * sqrt(getphilag(phi_ar, dday[i]))));
}
} else {
x_res[i] = 0;
y_res[i] = 0;
}
}
if(printme == 1){
print("target:", target());
}
}
R code:
makeinits <- function(data){
with(data, list(list(
mux_mean = mean(x),
muy_mean = mean(y),
mux1 = x[dday == 0],
muy1 = y[dday == 0],
mux1_sigma = sd(x),
muy1_sigma = sd(y),
rho1 = 0,
sigma_ar = mean(tapply(x,id,sd)),
tau = 1)))
}
system.time(model_simple <- stan_model(file = "buildfromscratch_v2.stan"))
k <- 6; j <- 5; n <- k * j;
x.means <- c(-3,-2,-1,1,2,0)
y.means <- c(-2,-3,0,0,1,2)
sigma <- 1
set.seed(10)
data_simple <- {list(n = n,
k = k,
x = rep(x.means, each = j) + rnorm(n, sd = sigma),
y = rep(y.means, each = j) + rnorm(n, sd = sigma),
yday = rep(1:j, k),
id = rep(1:k, each = j),
dday = rep(c(0,rep(1,j-1)), k))
}
with(data_simple, plot(x,y, col = id, asp = 1, pch = 19))
data_simple$printme <- 1
sink("test.out")
fit_test <- sampling(model_simple, data = data_simple,
init = makeinits(data_simple),
chains = 1,
iter = 3)
sink(NULL)