Sampling uniformly on output for a non-linear input-to-output map


I have some many to one map of the form: (lambda_1,lambda_2) :-> Q, where Q is a scalar (the ‘output’). I want to choose a sample of (lambda_1i,lambda_2i) (the `inputs’) such that the output distribution is uniform (imagine that the output is bounded).

I want to use MCMC to do this. My question is how? For example if Q = lambda_1^2 + lambda_2^2. It does not matter if the function is linear or non-linear.

Basically it seems that I need to correct for the fact that different contours Q(lambda_1,lambda_2) = const, have different lengths in output space, and I need to correct the sampling routine for this. However knowing the lengths of contours (a global property) is not possible a priori. So I wonder whether there is a stochastic way to do this.

Does anyone have an idea how this can be done? (The reason for doing this is that it would solve an inverse ODE problem.)



To state the problem a slightly different way: suppose that we have a univariate probability distribution Q ~ G(Q) distribution; a normal distribution for example.

Further suppose that Q = f(lambda_1,lambda_2), where f(.) is a deterministic function; for example Q = lambda_1^2 + lambda_2^2. My issue is the following:

  1. I can use MCMC to sample Q_i ~ G(Q) – this produces a distribution over Q that is (unsurprisingly) distributed as G().
  • If I sample (lambda_1,lambda_2) ~ G(f(lambda_1,lambda_2)), then I obtain some stationary distribution in lambda space. However if we take those samples and evaluate Q for each of them, then the sampling distribution H(Q_i) is not G(Q_i).

This is most obviously the case where if G() is a uniform distribution, then the samples of (lambda_1i,lambda_2i) are uniform in lambda-space, which does not lead to a uniform distribution over Q_i.

My questions are:

  • What is the stationary distribution over lambda space that I obtain in 2.? I think it is related to the length of contours Q(lambda_1,lambda_2) = const in lambda space.
  • Is there a way to modify the sampling of lambdas such that H(Q_i) = G(Q_i), as desired?



The reason for doing this is that it would solve an inverse ODE problem.

I’m not sure I follow the motivation for output sampling stuff, it may be worth just posting a question with the inverse problem itself and starting there. Not saying the motivation doesn’t exist – it’s just might be easier for me to think this with application attached :P.

But! To the questions in your second post, so if you know the joint density of lambda_1 and lambda_2, and you’re interested in the density of Q where Q = f(lambda_1, lambda_2) the Stan model would be:

parameters {
  real lambda_1;
  real lambda_2;
transformed parameters {
  real Q = f(lambda_1, lambda_2);
model {
  lambda_1 ~ normal(0, 1);
  lambda_2 ~ normal(0, 1);

In the case that f(x, y) = sqrt(x^2 + y^2), then if you work through the math, Q is a Rayleigh distributed ( The trick to getting this is recognizing sqrt(lambda_1^2 + lambda_2^2) is just the radius in polar coordinates (so P(Q = q) is just a matter of figuring out how much probability density lives on the rings x^2 + y^2 = q^2).

This make sense? Just try it for the lambdas normally distributed.

Is there a way to modify the sampling of lambdas such that H(Q_i) = G(Q_i), as desired?

This is the bit that I’m confused on. If you know Q should have some distribution, then why not just sample from that distribution?

Sorry I am not sure I am expressing myself well. The sort of inverse problem is an inverse sensitivity problem. I think I have worked out the issue now, but I repeat it because it still leaves open some practical questions…

So suppose that I have some deterministic function: Q = f(lambda_1,lambda_2). From experimental data I have a probability distribution over the possible output values: Q ~ G(Q). What I would like to do is determine the corresponding distribution over input values (lambda_1,lambda_2) that would produce that output.

Note this is not a Bayesian problem as there is no likelihood here: p(Q | lambda_1,lambda_2) = 0, unless Q = f(lambda_1,lambda_2).

However I nonetheless want to compute a conditional probability distribution over the inputs, given the output distribution G(.), so I would like to compute p(lambda_1,lambda_2|G(.)). The issue is that there are many values of the inputs that correspond to a given output. In other words there are contours in Q space.

I (naively) thought that using MCMC to sample: (lambda_1i,lambda_2i) ~ G( f (lambda_1,lambda_2)) would produce the desired input distribution on (lambda_1i, lambda_2i). By ‘desired’ here I mean one such that if I were to feed the samples (lambda_1i,lambda_2i) into f(.) to obtain Q_i, then I would obtain G(Q_i) – the observed output distribution.

However this is not the case. I have worked out that I essentially need to correct the sampling for the length of contours corresponding to a particular Q value. So (using Random Walk Metropolis) the accept/reject ratio actually becomes,

r = [f(lambda_1’,lambda_2’) / f(lambda_1’,lambda_2’)] * [L( f (lambda_1,lambda_2)) / L( f (lambda_1’,lambda_2’))],

where L(Q) is the length of contours in (lambda_1,lambda_2) space, such that f(lambda_1,lambda_2) = Q for all values of the inputs.

If you’re still with me (sorry for the longwindedness), this leaves some important practical research questions. In particular, how might we determine the function L(Q) in N-dimensional space to allow us to do MCMC? Obviously this would need to be an approximate method (probably a Monte Carlo one to avoid the curse of dimensionality).

Does anyone have experience with such a problem? Or alternatively have an idea of how to sidestep it?



Note this is not a Bayesian problem as there is no likelihood here: p(Q | lambda_1,lambda_2) = 0, unless Q = f(lambda_1,lambda_2)

Well that is a likelihood, I think. It would just going to be reaallly reallly hard to sample for a fixed Q. Is there a particular reason to avoid Bayesian inference here? Wouldn’t there at least be some sort of inference to fit a distribution to your experimental data?

Just curious, I’m not really that experienced with this stuff so I like hearing the motivations :D.

Eerm I suppose it’s a likelihood in the same way that any deterministic function can be one, but don’t think this is useful here.

The uncertainty here stems from the difficulty of inverting the map from the input to the output. If we knew how many input values mapped to each output that gives us a way to partition probability in input space.

The type of problem here is you are trying to judge the sensitivity of a system to its input parameters, assuming some output distribution.

Because Q reduces dimensionality, you have too many degrees of freedom without imposing some kind of constraint on the distribution of at least one of the L1. You can see that in your example.
These questions are like stats crossword puzzles!

Start from the desired outcome,

Q ~ uniform(lower, upper)

with the constraint lower > 0. Then you want to find some p_{L1,L2} such that if

L1, L2 ~ p_{L1, L2}


L1^2 + L2^2 ~ uniform(lower, upper)

For simplicity, let’s assume L1, L2 > 0 to avoid multiple solutions.

To see that there’s no unique solution without further assumptions on the distribution of L1 and L2, note that if you take L1^2 ~ uniform(lower, upper) and then set L2 = 0, you get the desired result (you just need to compute the Jacobian adjustment).

If that’s not the solution you’re after, you’ll have to further constrain the problem.