Inverse function in modeling

Hello,

Recently I’m trying to implement the models which include inverse function transformation. The target equation is (2), (3), and (5) in https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2011.01651.x
In summary, this probability function is

\begin{align*} f(\theta|\mu, \kappa, \nu) & = f(t^{-1}_{1, \nu}(\theta)|\mu, \kappa) \\ t_{1, \nu}(\theta) & = \theta - \nu(1+\cos(\theta)) \end{align*}

for circular distribution f(\theta|\mu, \kappa) like von Mises distribution.

They said there are no closed-form for t^{-1}_{1, \nu}(\theta), so I may have to calculate it by numerically. Although I checked the Stan manual, but I cannot find out a description about inverse function. Does anyone know the implementation technique?

You can use the algebraic solver.

If you know t(\theta),

Solve

0 = -t(\theta) + \theta - \nu (1 + cos(\theta))

For \theta

2 Likes

Thanks for your kindly guidance. It is what I need.

And could I ask two additional questions?

  1. Referencing Stan’s Manual chapter 20, I wrote model code in below. But when I kicked simulated data to the model, I got pretty far parameter. Do you have any advice?

  2. Even if MAP estimation, but the model works slow. Maybe it attributes to inverse transformation, but is there speedup techniques?

Note: I formulated Jacobian adjustment for variable change as

p(\theta) = vonmises(t^{-1}_{1, \nu}(\theta)|\mu, \kappa)|\frac{d}{d\theta} t^{-1}_{1, \nu}(\theta)|

and

|\frac{d}{d\theta} t^{-1}_{1, \nu}(\theta)| = \frac{1}{\frac{dt_{1, \nu}(\theta)}{d\theta}} =\frac{1}{1+\nu\sin(\theta)}

so

\log(p(\theta)) = vonmises\_lpdf(t^{-1}_{1, \nu}(\theta)|\mu, \kappa) - \log(1+\nu\sin(\theta))

Stan

code

functions{
    // system of inverse Batschelet transformation
    vector invBatschelet(vector y_init, vector t, real[] x_r, int[] x_i){
        // Get length of array
        int I = x_i[1];
        // value of interest
        vector[I] y ;
        // parameters (take it as fixed value would be understandable)
        vector[I] theta = t[:I] ;
        real nu = t[I+1] ;
        
        // construct equation to be zero
        y = -theta + y_init - nu * (1 + cos(y_init)) ;
        return y ;
    }
}

data {
    int I ;
    vector<lower=-pi(), upper=pi()>[I] RADIAN ;
}

transformed data {
    real x_r[0] ;
    int x_i[1] ;
    vector<lower=-pi(), upper=pi()>[I] THETA_INIT ;
    
    // set integer parameter to pass solver
    x_i[1] = I ;
    // set initial value of inverse transformed value
    for (i in 1:I){
        THETA_INIT[i] = 0 ;
    }
}

parameters {
    unit_vector[2] O ;
    real<lower=0.0> kappa ;
    real<lower=-1.0, upper=1.0> nu ;
}

transformed parameters{
    real mu ;
    vector<lower=-pi(), upper=pi()>[I] THETA ;
    vector[I+1] t ;
    
    mu = atan2(O[1], O[2]) ;
    for (i in 1:I){
        // transform angle by location parameter
        t[i] = RADIAN[i] - mu ;
        if (t[i] > pi()){
            t[i] = t[i] - 2*pi() ;
        }else if (t[i] <= -pi()){
            t[i] = t[i] + 2*pi() ;
        }
    }
    t[I+1] = nu ;
    THETA = algebra_solver(invBatschelet, THETA_INIT, t, x_r, x_i) ;   
}

model {
    THETA ~ von_mises(mu, kappa) ;
    // Jacobian adjustment by
    target += -log(1 + nu*sin(RADIAN)) ;
}

Python

code

from pystan import StanModel
from scipy.stats import vonmises
import numpy as np

model_code = ... # Paste above code
model = StanModel(model_code=model)

# Generate simulation data
I = 1000
RADIAN = vonmises.rvs(size=I, loc=0, kappa=0.3)
stan_data = {
    "I": I,
    "RADIAN": RADIAN
}

# Kick into the model
fit = model.optimizing(stan_data)
fit

Result

# Ideally, mu=0, kappa=0.3, nu=0
OrderedDict([('O', array([0.86992951, 0.49317608])),
             ('kappa', array(1.15716819)),
             ('nu', array(1.)),
             ('mu', array(1.05505937)),
             ...

I had a look at the paper

It says (eq. 1, and substituting in the von_mises stuff):

p(\theta | \mu, \kappa, \nu) \propto \text{von_mises_pdf}(t_{1, \nu}(\theta) | \mu, \kappa)

Isn’t that it (also is it inverse t or just t)? If that’s the lpdf, then that’s the lpdf. No transformation of variables necessary, I think.

They say after eq. 1 “Note that this is not the same as transforming the underlying circular random variable, at least not by using the transformation t.” I could easily be missing something though…

Ah, this may be fault. I mistakenly formulated the adjustment. As you pointed out, Jacobian adjustment is not necessary for this inverse transformation method. After removing adjustment line, this worked, although it is still slow…

Anyway, thanks a lot!


Note: I tried re-formulate this distribution by change of variable. Let the original function be von Mises distribution, what I’d like to compute is

\begin{equation} f_{von Mises}(t_\nu(\theta)) \\ \text{where}~\begin{cases} t_\nu(\theta) = t^{-1}_{1,\nu}(\theta) \\ t_{1, \nu}(\xi) =\xi - \nu(1+\cos(\xi)) \end{cases} \end{equation}

This variable could be changed to

\begin{eqnarray} f_{von Mises}(t_\nu(\theta)) &= & f_{von Mises}(t^{-1}_{1, \nu}(\theta)) \\ & = & f_{von Mises}(t_{1, \nu}(\xi))|\frac{dt_{1, \nu}(\xi)}{d\xi}| \\ & = & f_{von Mises}(\xi - \nu(1+\cos(\xi)))(1+\nu\sin(\xi)) \end{eqnarray}

But I’m interested in the distribution whose variable is \theta not \xi, and I think reformulation 1+\nu\sin(\xi) by \theta is not easy. So this function requires numerical computation.

Are you sure you need to calculate the inverse of the function though? (Discourse is showing errors on the eqs. you included)

Are you sure you need to calculate the inverse of the function though?

Yes. As the authors said, “The first major advantage of this approach is that there is no change to the normalizing constant from that of f”. I can check these phenomena below code. Indeed both direct and inverse transformation performed skewing, but the sum of the probability density function is only preserved in inverse function.
After fitting data to the model, I would like to compare the result with non-skewing models by information criterion such as AIC, BIC or WAIC. Thus changing the scale of the probability function is a problem for me because it affects the log likelihood.
As you concern, the inverse transformation may not necessary just to fit the parameters.

Discourse is showing errors on the eqs. you included

It work in my environment…(Safari and Chrome). I’ll paste the formulation.

Python

Code

from scipy.optimize import root
from scipy.stats import von_mises
import numpy as np
import matplotlib.pyplot as plt

def inverse_Batschelet(x, theta, nu):
    return -theta + x - nu * (1 + np.cos(x))
def invBvonmises_pdf(theta, kappa, nu, loc):
    theta = theta - loc
    theta[theta <= -np.pi] = theta[theta <= -np.pi] + 2 * np.pi
    theta[theta > np.pi] = theta[theta > np.pi] - 2 * np.pi
    init_value = np.zeros(theta.size)
    
    inv_theta = root(inverse_Batschelet, init_value, args=(theta, nu)).x
    p = vonmises.pdf(inv_theta, loc=0, kappa=kappa)
    return p

nu = 0.3
kappa = 0.3
mu = 0
theta = np.linspace(-np.pi, np.pi, 100)

theta_mu = theta - mu
theta_mu[theta_mu <= -np.pi] = theta_mu[theta_mu <= -np.pi] + 2 * np.pi
theta_mu[theta_mu > np.pi] = theta_mu[theta_mu > np.pi] - 2 * np.pi
xi = theta - nu * (1+np.cos(theta_mu))

p0 = vonmises.pdf(theta_mu, kappa=kappa, loc=0) # Normal
p1 = vonmises.pdf(theta_mu - nu * (1 + np.cos(theta_mu)), kappa=kappa, loc=0) # Direct transformation
p2 = vonmises.pdf(root(inverse_Batschelet, init_value, args=(theta_mu, nu)).x, loc=0, kappa=kappa) # Inverse transformation

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(theta, p0, "o", label="Symmetric von Mises")
ax.plot(theta, p1, "o", label="Direct transformation")
ax.plot(theta, p2, "o", label="Inverse transformation")
ax.set_xlim((-np.pi, np.pi))
ax.set_xticks([-np.pi, 0, np.pi])
ax.legend()
print(p0.sum())
print(p1.sum())
print(p2.sum())

Result

15.87163545711061
15.65435678752913
15.871635456923737

invBresult

Oh yeah sorry you’re right, I didn’t read to eq. 5. You just invert the function and plug it into the Von Mises.

If the inverse keeps being slow, you could manually code up a Newton’s method since you have your Jacobian.

Something like:

functions {
  real t1v_inv(real y, real nu) {
    real x = 0.0;
    for(i in 1:10) {
      x = x - t1v(x) / t1v_jacobian(x);
    }
    return x;
  }
}

You’d want to make sure and monitor the convergence of this. Probly with generated quantities:

generated quantities {
  real approx_error = t1v(x) - y; // This assumes you were solving t1v(x) = y for x
}

Or something. Hope that makes sense. This sorta thing can be finicky though to do manually.

1 Like

That’s what the algebra solver is doing internally. But maybe since your system is so simple you can get a small edge doing it manually? It’s a toss up and can end up making your model a bit fragile.

1 Like

Great! I implemented models by build-in Powell’s hybrid algorithm and Newton’s method to check the speed, . The latter model is extremely fast!! This solves my problem completely. I attach the scripts for someone who are interested in.

I can’t thank you enough!


Stan

Build-in function

model = """
functions{
    // system of inverse Batschelet transformation
    vector invBatschelet(vector y_init, vector t, real[] x_r, int[] x_i){
        // Get length of array
        int I = x_i[1];
        // value of interest
        vector[I] y ;
        // parameters (take it as fixed value would be understandable)
        vector[I] theta = t[:I] ;
        real nu = t[I+1] ;
        
        // construct equation to be zero
        y = -theta + y_init - nu * (1 + cos(y_init)) ;
        return y ;
    }
}

data {
    int I ;
    vector<lower=-pi(), upper=pi()>[I] RADIAN ;
    int K ;
}

transformed data {
    real x_r[0] ;
    int x_i[1] ;
    vector<lower=-pi(), upper=pi()>[I] THETA_INIT ;
    vector<lower=0.0>[K] A ;
    
    // set integer parameter to pass solver
    x_i[1] = I ;
    // set initial value of inverse transformed value
    for (i in 1:I){
        THETA_INIT[i] = 0 ;
    }
    for (k in 1:K){
        A[k] = 50 / K ;
    }
}

parameters {
    simplex[K] alpha ;
    unit_vector[2] O ;
    real<lower=0.0> kappa[K] ;
    real<lower=-1.0, upper=1.0> nu[K] ;
}

transformed parameters{
    real mu ;
    
    mu = atan2(O[1], O[2]) ;
}

model {
    alpha ~ dirichlet(A) ;
    for(k in 1:K){
        vector[I] THETA ;
        vector[I+1] t ;
        for (i in 1:I){
            // transform angle by location parameter
            t[i] = RADIAN[i] - mu ;
            if (t[i] > pi()){
                t[i] = t[i] - 2*pi() ;
            }else if (t[i] <= -pi()){
                t[i] = t[i] + 2*pi() ;
            }
        }
        t[I+1] = nu[k] ;
        THETA = algebra_solver(invBatschelet, THETA_INIT, t, x_r, x_i) ;   
        THETA ~ von_mises(0, kappa[k]) ;
    }
}
"""

model = StanModel(model_code=model)

Newton’s method

model = """
functions{
    vector inv_transformation_Batschelet(vector theta, real mu, real nu, int I){
        vector[I] t;
        t = theta ;
        // Number of iteration was evaluated with scipy's simulation
        for (i in 1:8){
            t = t - ((t - nu*(1+cos(t-mu)) - theta) ./ (1 + nu * (sin(t-mu)))) ;
        }
        return t;
    }
}

data {
    int I ;
    vector<lower=-pi(), upper=pi()>[I] RADIAN ;
    int K ;
}

transformed data {
    vector<lower=0.0>[K] A ;

    for (k in 1:K){
        A[k] = 50 / K ;
    }
}

parameters {
    simplex[K] alpha ;
    unit_vector[2] O ;
    real<lower=0.0> kappa[K] ;
    real<lower=-1.0, upper=1.0> nu[K] ;
}

transformed parameters{
    real mu ;
    
    mu = atan2(O[1], O[2]) ;
}

model {
    alpha ~ dirichlet(A) ;
    for(k in 1:K){
        vector[I] THETA ;
        THETA = inv_transformation_Batschelet(RADIAN, mu, nu[k], I) ;
        THETA ~ von_mises(mu, kappa[k]) ;
    }
}
"""

model = StanModel(model_code=model)

Python

Code

# Change sample size with your demand
I = 3000
RADIAN = vonmises.rvs(size=I, loc=0, kappa=0.3)
stan_data = {
    "I": I,
    "RADIAN": RADIAN,
    "K": 1
}

%timeit fit = model.optimizing(stan_data, seed=1234, init_alpha=1e-10)

Result

time_methods