It’s widely known in the developer circle that the unit-vector type is a bad apple. It doesn’t play nicely with HMC and there are plenty of cases where divergences or treedepth issues occur due to how the constraint is formed.
There’s a particularly informative thread on the issues at Divergence /Treedepth issues with unit_vector. Unfortunately, I don’t believe the jacobian adjustment suggested in that thread is correct. The derivative wrt to the unconstrained parameters results in a vector - not a square matrix - and one needs to introduce auxiliary parameters to push through into a square matrix transform. Fortunately, that transform is just the spherical coordinate transform. Then there’s no need to fiddle with a prior on r
(the radius of the sphere) because we can set it equal to 1. You’ll notice that this part bares similarity to the Marsaglia transform that Stan currently uses.
The potential issue with spherical coordinates (and one I believe @betanalpha has pointed out) is that the restriction of the angles between 0 and 2\pi is -\infty to \infty in the unconstrained space but we want 2\pi to be neighbors to 0. Also, @bgoodri pointed out that a spherical transform was used in Stan at some point. I do not know if it was written this way or what issues were discovered.
I understand that concern, however, in my tests where the space is restricted to between 0 and \pi for 2d and only introducing one 0 to 2\pi parameter for > 2d I don’t encounter any issues. In fact, I am able to resolve the current Marsaglia parameterization issues completely.
For example, in the post below (@mike-lawrence may be interested to see)
I can use the 2d circle parameterization and fit the model without divergences (it’s also twice as fast)
// saved as periodic_sphere.stan
functions {
// case when N == 2
vector unit_vector_transform_lp(vector theta, int N) {
vector[N] unit_vec;
unit_vec[1] = sin(theta[1]);
unit_vec[2] = cos(theta[1]);
return unit_vec;
}
}
data{
int n;
vector[n] y;
vector[n] x;
real<lower=0> rate ;
}
parameters{
// unit_vector[2] phase_helper;
vector<lower=0, upper=2 * pi()>[1] theta;
real<lower=0> amplitude;
real<lower=0> noise;
}
transformed parameters{
vector[2] phase_helper = unit_vector_transform_lp(theta, 2);
real phase = atan2(phase_helper[1],phase_helper[2]) ;
}
model{
amplitude ~ weibull(2,2) ;
noise ~ normal(0,1) ;
y ~ normal( amplitude * sin( x*rate*2*pi()+phase ) , noise ) ;
}
The R code is
n = 1e3
x = seq(0,10,length.out = n)
rate = .5
noise = .1
phase = pi/2
amplitude = 1
f = amplitude*sin(x*rate*2*pi+phase)
y = f + rnorm(n,0,noise)
plot(x,y,type='l')
lines(x,f,col='green',lwd=2)
stan_dat <- list(
n = length(y)
, x = x
, y = y
, rate = rate
)
periodic_sphere_mod <- cmdstan_model("periodic_sphere.stan")
mod_sphere_out <- periodic_sphere_mod$sample(
data = stan_dat,
parallel_chains = 4,
adapt_delta = 0.8
)
In the case of 3 or more dimensions the Stan code is
// Copyright (c) 2022, Sean Pinkney
// BSD 3-Clause License
functions {
// case when N > 2
vector unit_vector_transform_lp(vector theta1_, vector theta, int N) {
vector[N] unit_vec;
real theta1 = theta1_[1];
unit_vec[1] = sin(theta1) * prod(sin(theta));
unit_vec[2] = cos(theta1) * prod(sin(theta));
for (i in 1:N - 3)
unit_vec[i + 2] = cos(theta[i]) * prod(sin(theta[i + 1:N - 2]));
unit_vec[N] = cos(theta[N - 2]);
for (i in 2:N - 1)
target += (i - 1) * log(sin(theta[i - 1]));
return unit_vec;
}
// case when N == 2
vector unit_vector_transform_lp(vector theta, int N) {
vector[N] unit_vec;
unit_vec[1] = sin(theta[1]);
unit_vec[2] = cos(theta[1]);
return unit_vec;
}
}
data {
int uv;
}
parameters {
vector<lower=0, upper=2 * pi()>[1] theta1;
vector<lower=0, upper=pi()>[uv == 2 ? 0 : uv - 2] theta;
// unit_vector[uv] unit_stan;
}
transformed parameters {
vector[uv] unit_vec = uv == 2 ? unit_vector_transform_lp(theta1, uv) : unit_vector_transform_lp(theta1, theta, uv);
}
model {
}
generated quantities {
real r_out = sqrt(sum(pow(unit_vec, 2)));
}
edit: fixed the issue in N = 2. @Raoul-Kima pointed out that the bound needs to be 0 - 2pi. Also, no jacobian adjustment is needed in the 2d case because it’s the radius which = 1, a constant.