Data distribution defined via change of variable

Hello all, I have the following non-standard model that involves change of variables. Any advice/suggested references would be greatly appreciated.

I have a K by 1 real vectors y_i defined as

y_i = f(theta, e_i), i=1,…,n

where

  • theta is a real vector of parameters for which a prior is given
  • e_i is a K by 1 real vector of unobservable whose distribution is known: e_i ~ p(e) e.g. independent normals
  • the known function f is one-to-one between e_i and y_i for all values of theta

Via the change of variable, the density of y_i is implied for a fixed theta. The goal is to simulate from the posterior of theta given y_1,…,y_n.

Q1: What would be the most natural way to code the set-up in Stan?
Q2: Would it be necessary to manually code the Jacobian that shows up in the change of variable formula?

Thanks!

Hi, @kshimizu1 and welcome to the Stan forums. Sorry this has taken so long to answer.

It’s not clear to me what you want to do statistically, but it sounds like you’re trying to model a “noise” or “error” distribution with the e[i]. That will determine how you need to code the model. For instance, if you’re trying to write a regression where y_n = \alpha + \beta \cdot x_n + \epsilon_n, where \epsilon_n \sim \text{normal}(0, \sigma), then you don’t use a change of variables at all for y in Stan, you just write

y[n] ~ normal(alpha + beta * x[n], sigma);

The nice part about formulating things this way is that it makes the generative model clear and easily generalizes to constrained cases, like a lognormal regression for positive-constrained y[n], where you can’t just randomly sample noise and add it to an expectation because of the non-linearity.

If you have a parameter X in Stan and a smooth monotonic function f, then you work out the distribution of Y = f(X) using a Jacobian adjustment. You will need to do that if your model involves a distribution over Y, such as Y ~ lognormal(mu, sigma);. We do this for parameters in Stan implicitly.

In contrast, if you only use Y on the right-hand side of distributions, you don’t need the change of variables as you are getting the distribution over Y implicitly through the function f.

1 Like

Hi @Bob_Carpenter,

Happy new year! Thank you so much for your reply and my apologies for the late response.

In my problem, the function f is implied by some economic theory and is a complicated non-linear function of both parameters \theta and a K by 1 unobservable e_i. The K by 1 vector Y_i is observed:

Y_i=f(e_i; \theta),

where e_i follows a known distribution such as independent standard normal.

We only know the following about the function f:

  1. It is invertible for each \theta, i.e. f^{-1}(Y_i; \theta) exists.
  2. There is no closed form for f^{-1}(Y_i; \theta).

The statistical goal is to learn about the posterior distribution of the parameter \theta given some prior. (I need to impose the functional relationship above when learning about \theta, so I cannot directly specify a distribution over Y_i.)

By change of variable, the density for the observed Y_i is

p(Y_i; \theta)=p_e\left(f^{-1}(Y_i; \theta)\right) \bigg|det \left( \frac{d f^{-1}(Y_i;\theta)}{d Y_i}\right) \bigg|,

where p_e is the density of e_i (e.g. the independent standard normal density). The data consists of n iid observations \{Y_1,...,Y_n\}. Given the prior density \pi(\theta), the posterior density is proportional to

\pi(\theta) \cdot \prod_{i=1}^n p(Y_i;\theta),

where the second term is the likelihood function. The prior is standard e.g. normal.

Evaluating the likelihood involves, for a given \theta, (1) computing the inverse f^{-1}(Y_i; \theta) and (2) computing the Jacobian term \bigg|det \left( \frac{d f^{-1}(Y_i;\theta)}{d Y_i}\right) \bigg|.

Q: It seems that we would have to manually code the likelihood function in Stan, including coding a function that would numerically invert f (perhaps, using Algebraic Equation Solver) and a function that would compute the Jacobian term using the inverse function theorem. Before trying to do that we wanted to check if there is an easier/automatic way to handle this type of model in Stan. Also, do you think this estimation problem could be feasible at all in STAN?

Thank you for your help!

That’s right. Stan requires a definition of the target log density.

Not that I know of. We’ve had to use our algebraic solvers and differential algebraic solvers for equilibrium models in econ and there are other Stan devs and users who are much more familiar with these applications than me.

We have fit models involving these things before, but it can be tricky and be sensitive to initialization and to priors. Without knowing what you’re trying to fit, it’s hard to say (and even if I did see it, I’d probably just suggest trying it). @charlesm93 may have more advice as he was involved in designing those modules.

I hope you don’t mind some nitpicky P.S.s.

P.S. If the argument is a matrix A, then |A| is usually taken to be the absolute determinant.

P.P.S. In Bayes, it’s p(Y \mid \theta) because it’s a true conditional probability with \theta being treated as random.

P.P.P.S. It’s “Stan” because it’s not an acronym (it’s named after Stanislaw Ulam, the “inventor” of Monte Carlo methods).

2 Likes