When I run this model, all my transitions are divergent. I’m trying to figure out what a “non-centered” model would be here.
This model is a Poisson Process, in which we have multiple Poisson counts with rate \int_l^r f(x) = F(r) – F(l), for many different intervals [l,r]. These intervals are also unknown: we only know that they partition the unit interval and that their lengths can be assumed to follow a Dirichlet(1, …, 1) prior.
I’m using a positive_ordered to model the cumulative rate function F. Would it be better to instead use real<lower=0> f[] to represent the rate directly? Is one of these parametrizations more likely to cause problems with divergent transitions?
functions {
int real_to_int(real x) {
int i = 0;
while(i < x) {
i += 1;
}
return i;
}
// Should we vectorize this and have it done all at once, i.e. real[] x?
real interpolate(vector F, real x, int T) {
// i and i+1
int i;
real res;
real flo = floor(x*T); // possible values: {0 ,..., T-1}
real lambda = x*T - flo;
i = real_to_int(flo) + 1;
res = (1 - lambda) * F[i] + lambda * F[i+1];
return(res);
}
}
data {
int<lower=0> N; // number of X obs, yielding N+1 buckets
int<lower=0> T; // number of line segments (granularity of Gaussian Process)
int<lower=0> Y[N+1]; // Number of Y samples in bucket k
}
transformed data{
int N_Y = sum(Y);
real empty_r[0];
int empty_i[0];
}
parameters {
simplex[N+1] intervals; //dirichlet
positive_ordered[T+1] F; // Cumulative density ratio.
real<lower=0> sigma; //how quickly the log-density ratio can vary
}
transformed parameters {
real<lower=0> dF[T]; // density ratio
real ddF[T-1]; // 1st deriv of density ratio
for (t in 1:T)
dF[t] = F[t+1] - F[t];
for (t in 1:(T-1))
ddF[t] = dF[t+1] - dF[t];
}
// ToDo: change this to a non-centered implementation?
model {
print("BEFORE: target = ", target());
{
real l = 0; // left end of interval
real r; // right end of interval
real rate; // Poisson rate
real F_l = 0; // F(0) = 0
real F_r;
print("rate = ", rate);
intervals ~ dirichlet(rep_vector(1.0, N+1));
for (k in 1:(N+1)) { // for each bucket
r = l + intervals[k];
F_r = interpolate(F, r, T); //interpolate
rate = F_r - F_l; // the Poisson rate is the integral
Y[k] ~ poisson(N_Y * rate);
l = r;
F_l = F_r;
}
/////////////////////
// Smoothing prior
for (t in 1:(T-1)) { // for each section of the function approximation
dF[t+1] ~ lognormal(dF[t], sigma);
}
// Set F(0) = 0.1. Doing this here avoid having to use 'min' above.
F[1] ~ normal(0.1, 1e-9);
}
}
There’s no difference; under the hood positive_ordered is implemented essentially as an array of lower-bounded reals.
That F[1] ~ normal(0.1, 1e-9) is rather extreme and could be causing numerical problems. Maybe try 1e-3 instead.
You don’t have a prior on sigma so I don’t think the “smoothing prior” does much smoothing. And since you don’t know how large the intervals are I don’t think F is really identifiable from the data.
just to make sure, lognormal(F[t-1], sigma) does not mean that you have log-mean with F[t-1] rather than it is the meanlog, the first parameter of the distribution. So I don’t think that that line does what you implied
ps: i would do smoothing in log-scale and then exponentiate rather than implementing lognormal distribution. you can then do a non-centered parametization
The important part of the message is outside of declaration block. The assignments to transformed parameters must happen inside the transformed parameters block.
DIAGNOSTIC(S) FROM PARSER:
Info:
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
log_dF_increment ~ normal(...)
It sounds like the autodiff doesn’t compute the Jacobian for me…
My transitions are still divergent. Here’s my updated code:
functions {
int real_to_int(real x) {
int i = 0;
while(i < x) {
i += 1;
}
return i;
}
// Should we vectorize this and have it done all at once, i.e. real[] x?
real interpolate(vector F, real x, int T) {
// i and i+1
int i;
real res;
real flo = floor(x*T); // possible values: {0 ,..., T-1}
real lambda = x*T - flo;
i = real_to_int(flo) + 1;
res = (1 - lambda) * F[i] + lambda * F[i+1];
print("res = ", res);
return(res);
}
}
data {
int<lower=0> N; // number of X obs, yielding N+1 buckets
int<lower=0> T; // number of line segments (granularity of Gaussian Process)
int<lower=0> Y[N+1]; // Number of Y samples in bucket k
}
transformed data{
int N_Y = sum(Y);
}
parameters {
simplex[N+1] intervals; //dirichlet
positive_ordered[T+1] F; // Cumulative density ratio.
real<lower=0> sigma; //how quickly the log-density ratio can vary
}
transformed parameters {
real<lower=0> dF[T]; // density ratio
real log_dF[T]; // log of density ratio
real log_dF_increment[T-1]; // 1st deriv of log density ratio
real ddF[T-1]; // 1st deriv of density ratio
for (t in 1:T) {
dF[t] = F[t+1] - F[t];
log_dF[t] = log(dF[t]);
}
for (t in 1:(T-1)) {
ddF[t] = dF[t+1] - dF[t];
log_dF_increment[t] = log_dF[t+1] - log_dF[t];
}
}
// ToDo: change this to a non-centered implementation?
model {
print("BEFORE: target = ", target());
{
real l = 0; // left end of interval
real r; // right end of interval
real rate; // Poisson rate
real F_l = 0; // F(0) = 0
real F_r;
print("rate = ", rate);
intervals ~ dirichlet(rep_vector(1.0, N+1));
for (k in 1:(N+1)) { // for each bucket
r = l + intervals[k];
F_r = interpolate(F, r, T); //interpolate
rate = F_r - F_l; // the Poisson rate is the integral
print("rate = ", rate);
Y[k] ~ poisson(N_Y * rate);
l = r;
F_l = F_r;
}
/////////////////////
// Smoothness prior
// ToDo: Define the adjacent differences of logs, in 'transformed_parameters'
for (t in 1:(T-1)) { // for each section of the function approximation
log_dF_increment[t] ~ normal(0, sigma);
}
sigma ~ exponential(1);
// Set F(0) = 0.1. Doing this here avoid having to use 'min' above.
F[1] ~ normal(1, 1e-3);
}
}
Apparently the most suspicious part of my model is the piecewise-linear approximation of F. This might be particularly bad since the model refers to a 2nd derivative of it (namely: log_dF_increment)
I think it would be better to use piecewise quadratic, or piecewise cubic, or splines. The tricky part is enforcing a monotonicity constraint, so that my integral works efficiently.
The B-splines code for Stan isn’t general enough for my problem – it assumes that the set of points X for which the spline is calculated is in the data. In my problem they are a parameter, so the spline needs to be rebuilt at every iteration… I hope this won’t be hopelessly slow.
There is an obvious inefficiency in the recursive implementation whenever the order k>2 … A bottom-up algorithm would be better … or memoization.