I have a dataset of points (x, y) that are distributed within a disk, and I would like to estimate the radius (and eventually the center) of the disk using a Stan model. Here is an example with some simulated data in R
set.seed(42)
n_obs <- 1000
true_radius <- 2
# Generate random angles and distances from the center
angles <- runif(n_obs, 0, 2 * pi)
distances <- runif(n_obs, 0, true_radius)
# Calculate x and y coordinates
x <- distances * cos(angles)
y <- distances * sin(angles)
# Combine x and y coordinates into a data frame
data <- data.frame(x = x, y = y)
The histogram of the distances of each point from the center looks like a right triangle with the likelihood at 0 being 0 and the likelihood at r being 1, it’s a line from 0 to 1.
So I defined a custom likelihood in stan and run this model
functions {
real triangular_lpdf(real x, real a, real b) {
if (x < a || x > b)
return negative_infinity();
else
return log(2 * (x - a) / square(b - a));
}
}
data {
int<lower=0> n_obs;
vector[n_obs] x;
vector[n_obs] y;
}
parameters {
real<lower=0> r;
}
model {
// Priors
r ~ normal(2, 1);
// Likelihood
for (i in 1:n_obs) {
real dist = sqrt(square(x[i]) + square(y[i]));
target += triangular_lpdf(dist | 0, r);
}
}
Although the model runs relatively fast, and does give a good estimate of r, it results in hundreds of divergent transitions (of 2000). I have tried adjusting the sampler parameters (adapt_delta and stepsize), but these just seem to slow the down the model with no improvements.
Trace plots for r don’t look good, but estimates are mostly right, and pairs plot shows that r is almost perfectly anti-correlated to lp__
This seems like a simple problem, and I feel like I am missing something basic. Any thoughts?