'Inverse' regression from a brms model

I’m working on a non-linear model for analytical method calibration in brms. The model is a univariable regression of assay system response as a function of a known standard concentration. After successfully fitting the model, I’d like to reverse the model in order to obtain posterior distributions for new predictors X, given observed new outcomes Y.

Is this achievable from the fitted model object using the existing prediction functions in brms?

  • Operating System: Windows 10
  • brms Version: 2.12.0

Check out posterior_predict and posterior_linpred, that should do the trick for new predictors!
If you also need posterior given new observed outcomes, you need to refit the model

Best of luck with your model.

martinmodrak many thanks for your response. Apologies for the delay, unfortunately this was put on hold for a while under COVID-19.

I think I wasn’t clear with my question; I’m aware of the capability for posterior predictions for the response variable from a brms model. Instead, what I’d like to do is to ‘invert’ the completed regression model, so that instead of y~x, I can obtain x~y. In other words, given a new observation of the response y, determine a conditional distribution for the value of the predictor x. Say for example that an experiment can be conducted to obtain y conditional on x, but in practice y is to be used as a surrogate variable for x (perhaps given a prior for x).

Hi @AWoodward. I was doing some assay standard plots just yesterday so I know what you are trying to do here. However I’ve never done this using Stan, I use the r package drc (https://katatrepsis.com/2018/09/24/elisa-analysis-in-r/).

To do what you want here I think you might need to break out pen and paper to transform your model equation to solve for x. So for example if you model was y = mx + c, you transform that into x = (y - c)/m. However, this gets confusing if you use x squared or x cubed in your model because you would then get non-unique roots when solving for x. If you manage to solve for x, then you can plug your model parameter estimates into that equation along with your new values for y to calculate the x values for your new data. I’m not aware of any special Stan tricks to avoid this manual equation solving, however perhaps @martinmodrak or other experts know some helpful tricks here.

1 Like

Agree with @jroon that you’d probably need to do this manually. Also I think that “inverting” is generally a complex process and the equation solving as described would only make sense for Gaussian likelihood. I am even not sure if the method would be exactly the same as fitting the inverse regression even for Gaussian likelihood, as priors can interact with the simple picture you have in a frequentist/maximum likelihood setting. (but this is not really my expertise, and I’d be happy to be proven wrong).

The general spirit however stays: if you know how to do this “inverse” with a single estimate of model parameters (say from a frequentist model), you can do it with a Bayesian model: just apply the method separately to each posterior sample. The results are then individual samples of the Bayesian prediction for the quantity you care about.

Does that make sense?

Thanks @jroon, I’ve run similar problems with the frequentist methods before, but the current problems are nonlinear and heirarchical so I think the Stan approach is a much better fit. I don’t think there will be algebraic solutions to the transformations (and I’m after a general solution) but I can try it numerically.

@martinmodrak cheers, the brms implementation has been so smooth for the other parts of this problem that I was hoping this would be handled by one of the built-in functions. Alas, I will pursue a manual approach from the posteriors.

Thanks for your comments.

1 Like

So at the risk of being slightly rebellious and quite possibly wrong… we model y ~ x because that is the convention in laboratory science - the thing you control goes on the x axis and the thing you measure goes on the y access, could we instead model x ~ y? But… also the issues in this Tweet thread seem relevant: https://twitter.com/NoahHaber/status/1304783168592060416?s=20 => because OLS minimises distance over y, then y ~ x is not the same as x ~ y.

Something I don’t know now that I think about it @martinmodrak - does Stan “prefer” one axis over the other that way OLS does ? Because if not perhaps you could ignore the usual convention to model y ~ x , then reverse equation, predict x from y, and instead just model x ~ y directly ?
(Major caveat here is that I only occasionally deal with stuff like this and am definitely no expert! Merely thinking out loud here!)

1 Like

Just want to say that I totally agree - if we care about x ~ y, the easisest way is to actually model it this way. So in the context of a calibration, when we know a true value x and measure y on a device, it makes most sense to me to write a model for x ~ y directly and avoid the “inverse”. I however assumed @AWoodward has a reason why this is not possible…

I further think that the difference between OLS (ordinary least squares) and TLS (total least squares) from the Twitter thread by Noah is a different concern. If I understand things correctly, OLS for x ~ y is often the same as for y ~ x. Whether it actually is the same depends on which of the two variables are assumed to be observed with error and whether we choose (control) values of one of the variables or whether it is a random sample from the population. I don’t want to be to very precise here, because I don’t understand this stuff well and don’t want to say something that’s misleading. The classic reference (which I admit to not understand fully) for this is AFAIK Berkson: Are There Two Regressions? https://www.jstor.org/stable/2280676?seq=1

In Stan/Bayes literature the same concern is AFAIK handled with “measurement error” models - brms has the me term for this. Though you sometimes see “Berskon error” as well.

Hope that clarifies more than confuses.

1 Like

Indeed, in my cases of interest x is fixed by experimental design, and so is not a random variable. Consider the case where x can be experimentally set, but cannot be directly observed, whereas y can be observed but not experimentally set. I think analytical method calibration is a good example. I can make calibration standards of known x, and observe y. But in an observational sample, I can only infer x via the observed value of y. It wouldn’t make sense to set x as the response during calibration because it is fixed by design.

I appreciate the discussion! I’ll navigate the approach directly from the posterior and try and post an update at some point.

1 Like

So this problem has been bugging me on a run. I don’t think I have my thoughts completely sorted out, but I need to move on to actual work, so I’ll share them to have peace of mind and hopefully help you think more clearly.

Let us imagine we have performed a calibration with chosen true values X = x_1, ... x_n which lead to observations Y = y_1, .... y_n. For simplicity let us assume that the true relationship is a simple linear function with Gaussian noise. Further, we’ve observed a new data point \hat{y} and we want to make inferences about the true value \hat{x}, which is unobserved. A full Bayesian treatment of this problem leads to a joint model:

y_i \sim N(\mu_i, \sigma) \\ \mu_i = \alpha + \beta x_i \\ \hat{y} \sim N(\hat\mu, \sigma)\\ \hat{\mu} = \alpha + \beta \hat{x}\\ \alpha \sim \pi_\alpha, \beta \sim \pi_\beta, \sigma \sim \pi_\sigma, \hat{x} \sim \pi_{\hat{x}}

Where \alpha, \beta, \sigma, \hat{x} are parameters and \pi_\alpha, \pi_\beta, \pi_\sigma, \pi_{\hat{x}} are their associated priors. So if I am not wrong about this, if you want to propagate all the uncertainty correctly, anything you do needs to be either proven to be equivalent to inference over this full model or needs to be thought of as an approximation to this full model.

I actually think that this full model can be expressed in brms by treating the \hat{x} as a missing predictor, i.e. something like (not tested) bf(y ~ mi(x)) + bf(x | mi() ~ 1). However forcing you to rerun the Stan model for each data point you want to infer seems bad.

Now this is where my math/stats fail me. I find it quite likely (but can’t prove) that the full posterior distribution p(\alpha, \beta, \sigma, \hat{x} | X, Y, \hat{y}) can be broken down, so that:

p(\alpha, \beta, \sigma, \hat{x}| X, Y, \hat{y}) = p(\alpha, \beta, \sigma | X, Y) p(\hat{x} | \alpha, \beta, \sigma, \hat{y})

And, by Bayes theorem

p(\alpha, \beta, \sigma | X, Y) p(\hat{x} | \alpha, \beta, \sigma, \hat{y}) \propto p(\alpha, \beta, \sigma | X, Y) p(\hat{y} | \alpha, \beta, \sigma, \hat{x}) \pi_{\hat{x}}(\hat{x})

i.e. that if you do the inference for \alpha, \beta, \sigma separately, giving you (samples from) a joint posterior distribution p(\alpha, \beta, \sigma) then you can make inferences for \hat{x} separately. But maybe even that is just an approximation… At least if the prior on \hat{x} is not improper uniform, one can imagine that if “inverting” the calibration data gives you values inconsistent with the prior, you would need to update you belief about the calibration data… But maybe this is a good thing, maybe we don’t want to update our beliefs about the calibration just because it would lead to absurd inferences for new values? Sorry, can’t wrap my head around this completely right now. But I think that if the calibration experiment is informative and our priors on \hat{x} are not too tight, \hat{y} should have negligible effect on everything except \hat{x} and the separation is likely justified for practical use.

In any case, you IMHO at least need to handle the uncertainty introduced between \hat\mu and \hat{y}. I think (once again, not so sure about the math, double check me) that if \pi_{\hat{x}} = N(m, \tau) then, treating \alpha, \beta, \sigma as fixed you can say that a-priori \hat\mu \sim N(\alpha + \beta m, \tau \beta). Following wiki we get an analytical form of the posterior for \hat\mu is (with slight abuse of notation)

p(\hat\mu | \alpha,\beta,\sigma, \hat{y}) = N\left(\frac{\hat{y}/\sigma^2 + (\alpha + \beta m)/(\tau^2 \beta^2)}{1/\sigma^2 + 1/(\tau^2 \beta^2)}, \frac{1}{1/\sigma^2 + 1/(\tau^2 \beta^2)}\right)

If you instead assume improper flat prior for \hat{x} (and hence \hat\mu) that means \tau \rightarrow \inf and 1/\tau^2 \rightarrow 0

So the full posterior would be:

p(\hat\mu | \alpha,\beta,\sigma, \hat{y}) = N\left(\hat{y}, \sigma^2 \right)

This gives us the following process to infer \hat{x} given \hat{y} and posterior samples for \alpha, \beta, \sigma

  • For each posterior sample from the calibration model:
    • Compute the conditional posterior of \hat{\mu} given \hat{y}
    • Draw a single sample of \hat{\mu}
    • Transform the sample to of \hat{\mu} to a sample of \hat{x} by “inverting the model”

Obviously, once we move beyond gaussian distributions and linear relationships, things become way more complicated in practice (because we most likely loose the ability to have analytical form for p(\hat\mu | \hat{y}))

Does that make at least some sense?

EDIT: I obviously can’t let this go :-). It ocurred to me that thinking about the problem as a joint model also clarifies what it means when you have repeated measurement which you assume came from the same unobserved \hat{x} (all those measurements should jointly inform the posterior for \hat{\mu}). Similarly, if you have multiple measurements of different \hat{x}_1, .. \hat{x}_k - the inferences cannot be treated as independent - e.g. if there would be substantial uncertainty in \alpha in the above model, you would expect either all \hat{x}_1, .. \hat{x}_k to be “somewhat low” (if \alpha is actually low) or all “somewhat high” (if \alpha is high), creating a correlation in the posterior for \hat{x}_1, .. \hat{x}_k.

2 Likes

I have the same problem as the original poster. Has there been any progress on this issue? Thx

@Qoftheday the OP marked the problem as solved, so it is a bit hard to guess what exactly are you missing. If you have specific questions about details of the proposed solutions, feel free to ask. If you think you need something substantially different than what was offered in the discussion so far, it might be better to start a new discussion topic.

For the particular case here I settled on using a ‘missing predictor’ approach with the model implemented in Stan. But as the discussion highlighted the reasonableness of any particular approach (and if problems like this are even tractable) will depend pretty strongly on the model.