Computing or sampling a posterior with samples observed through a dimensional reduction transformation

Let \boldsymbol \theta be a vector of parameters, with a known prior \pi(\boldsymbol \theta).
Let \boldsymbol x_1,...,\boldsymbol x_n be i.i.d. samples with \boldsymbol x|\boldsymbol \theta known.
The problem is that we do not observe \boldsymbol x_i, but \boldsymbol g(\boldsymbol x_i). Here \boldsymbol g is a dimensional reduction transformation. How to numerically compute or sample the posterior \pi(\boldsymbol \theta | \boldsymbol g(\boldsymbol x_1),...,g(\boldsymbol x_n)) efficiently?

A simple example is here. For a rectangle, the width w\sim {\rm Uniform}(0, a) and the length l\sim {\rm Uniform}(0,b), and (a,b) follows a known prior.
But we cannot observe (w,l). Instead, we can observe the area s = w\cdot l.
How to numerically compute or sample the posterior \pi(a,b|s)?

Related problem: Estimating a posterior for a parameter which is only observed through (a part of a) linear transformation

1 Like

Do you assume that you then observe \boldsymbol{g}(\boldsymbol{x}_i) without any additional noise?

If there is noise in observing \boldsymbol{g}(\boldsymbol{x}_i) that - at least in principle - you can treat \boldsymbol x_1,...,\boldsymbol x_n as explicit parameters in your model and directly include the dimensionality reduction in your model to compute a the distribution \pi(\boldsymbol{g}(\boldsymbol{x}_i) | \boldsymbol{x}_i)

If there is no noise, then you need to somehow invert the dimensional reduction transformation, i.e. get a description of the set \{\boldsymbol x_1,...,\boldsymbol x_n | \boldsymbol{g} (\boldsymbol x_1) = y_1,...,\boldsymbol{g}(\boldsymbol x_n) = y_n\} - this might be possible for simple methods like using first few PCA components, but will be pretty annoying.

E.g. in the area example you would treat w as an extra unknown parameter and compute l = \frac{s}{w}. You can now directly express your model for \pi(w,l | \theta) while adding a Jacobian adjustment for the transform.

An approximate approach might be possible by doing something similar to multiple imputation (since your problem can be understood as a missing data problem). I.e. you don’t insist on finding all \boldsymbol x that produce the observed reduced data, but just get a bunch of different possible values, and for each of those fit a model for \pi(\boldsymbol x | \boldsymbol \theta) and then pool the samples from all of those fits.

You could likely obtain suitable samples of \boldsymbol x_1,...,\boldsymbol x_n by starting with random values for \boldsymbol {x}^* and then run an optimization algorithm to minimize \sum_i^n (\boldsymbol{g}(\boldsymbol x_n) - \boldsymbol{g}(\boldsymbol {x}^*_n))^2 or similar.

In all cases I think the problem is likely to be computationally somewhat ill-posed - there might be many different regions of \boldsymbol \theta that produce similarly looking distribution over \boldsymbol g(x) and thus you will have multimodality.

Hope that clarifies more than confuses and best of luck with your model!

1 Like