Linear Regression with QR decomposition AND error in predictors?

kaskogsholm · July 9, 2025, 5:57pm

I’m working on modeling a calibration process for spectroscopic measurements of turbidity in microtiter plates. The details aren’t super relevant, but we end up varying both the concentration of a standard, C_{std}, and the well volume, V_w (the volume of the solution loaded into each well) and measuring the absorbance of each well. The absorbance, A increases linearly with both variables under certain circumstances, but the relationship becomes non-linear at very high concentrations and for very long pathlengths (which are produced by very full wells, so a high well volume). So I end up with a polynomial regression with two predictors but significant correlation between the predictors. Here’s an example.

A = \beta_0 +\beta_1 V_w + \beta_2 V_w C_{std} + \beta_3 V_w^2 C_{std} + \beta_4 V_w C_{std}^2

The standard concentrations are prepared by serial dilution, so both the well volume and the standard concentrations have error introduced by the imperfections of the liquid handling robot we use. I have a decent generative model for the dilution error developed for a different purpose.

Long story short, I want to deal with both correlated predictors and measurement error.

I’ve read this part of the user guide, this blog post, and this part of the user guide, but I’m uncertain of the correct way to combine both the QR reparameterization and a model for measurement error into a single, correct program.

Are there any examples of this?

I’m also open to any suggestions for different approaches or suggestions. Polynomial regression is typical in the literature for these types of measurements, but I don’t know if it’s really the best approach.

caesoma · July 10, 2025, 10:37am

Isn’t what you want a combination of QR regression with error describe by a covariance matrix? If not, it’s not quite clear what is missing.

It’s not completely clear to me what do you mean by “measurement error”, is it that just assuming a gaussian likelihood with fixed variance wouldn’t be a good model?

Is this a model for the correlation between errors? If so, you should be able to multiply that by a fixed variance, use that as a prior for an inferred covariance matrix for you measurement error.

kaskogsholm · July 10, 2025, 12:29pm

I think what I’m missing must be basic enough to be really obvious to the more experienced. Here’s what I get when I try to naively combine the two examples in the user guide:

data {
  int<lower=0> N;   // number of data items
  int<lower=0> K;   // number of predictors
  matrix[N, K] x_meas;   // predictor matrix, but measured
  real<lower=0> tau;     // measurement noise
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  vector[K] theta;      // coefficients on Q_ast
  real<lower=0> sigma;  // error scale
  matrix[N, K] real x;    // unknown true values for the predictor matrix
}
transformed parameters {
  // Since the unmeasured, true, x is now a parameter, I assume I have to 
  // compute the QR decomposition in the transformed parameters block
  matrix[N, K] Q_ast;
  matrix[K, K] R_ast;
  matrix[K, K] R_ast_inverse;
  // thin and scale the QR decomposition
  Q_ast = qr_thin_Q(x) * sqrt(N - 1);
  R_ast = qr_thin_R(x) / sqrt(N - 1);
  R_ast_inverse = inverse(R_ast);
}
model {
  x_meas ~ normal(x, tau);    // measurement model
  y ~ normal(Q_ast * theta + alpha, sigma);  // data model
}
generated quantities {
  vector[K] beta;
  beta = R_ast_inverse * theta; // coefficients on x
}

Is this correct? And, is it a useful thing to do? If we compute the decomposition in the transformed parameters block, my understanding is that it will be computed pretty frequently, so it may no longer be worth it. Is there something better to do?

I am specifically talking about error between the actual value of an element in the predictor matrix and its “target” or measured value. The model for this variance is more complicated than gaussian noise, but the parameters of the model can be treated as fixed for this example.

To be more explicit, I have a robot that moves liquid samples around, and the error in the volume transferred has been measured. It looks like V_{actual} \sim N(V_{target}, g(V_{target})) where g is a linear function. The coefficient of variation basically ranges from 2-3% for very small volumes to 0.2-0.5% for larger volumes.

But the fill volume and the concentration of the standards, which are used to construct the predictor matrix, depend on multiple individual volume transfers. Since there is serial dilution, the most dilute standard concentration would be affected by the error in ~12 previous transfers.

Hopefully all that detail helps clear up what I’m talking about.

This is a bit beyond me. The model for error in the dilution series does result in errors that are correlated, but I don’t have an explicit correlation matrix.

spinkney · July 10, 2025, 1:19pm

What are you trying to accomplish with the fit of this model? Are you interested in predicting when this needs to be recalibrated or what covariate values indicate a recalibration?

If the parameters in that regression are not of particular scientific interest and you are mainly interested in when to recalibrate then I suggest some robustified polynomial regression (e.g. using the median or student t instead of normal) or something more flexible like wavelets or Gaussian/Student T processes. Which you can also use sparsifying priors and other techniques to capture the measurement error.

kaskogsholm · July 10, 2025, 1:52pm

The thing of scientific interest is the concentration of an unknown sample.

The main goal is to convert measurements from very different circumstances to a compatible unit. We have multiple models of spectrophotometers with very different designs, some with a fixed pathlength, others that read plates with varying pathlengths, etc. etc. And, when people use microtiter plates, they use fill volumes and diluents that are specific to their experiment, and can’t necessarily be fixed.

So the plan was to run this calibration experiment for each unique combination of (spectrophotometer, diluent media, microtiter plate/cuvette), all with the same standard. Then I would use the posterior draws of the regression parameters to convert the instrument/experiment specific absorbance to a concentration (in the units of this standard) that is comparable across instruments and contexts. Here is the paper where I got the idea: Robust estimation of bacterial cell count from optical density | Communications Biology. There are more scientific details there, for the curious. I’m using the microsphere protocol that performed the best in that study.

So, first a bunch of calibration experiments. Then, many conversions thereafter to correct measurements from different contexts. I hadn’t thought of using the results to determine the recalibration interval, and it would be more of a secondary objective.

caesoma · July 11, 2025, 10:14am

So, if you want to boil it down to the simplest example possible, it’s as if you have a model y = ax + b, but x cannot be measured exactly (and can it ever?) so you want to allow it to vary as a parameter, but informed by your knowledge on the error of those measurements. Is that about right?

I was under the impression that that error was random, and would suggest something similar to what @spinkney did. But this says that you have a fairly straight (no pun intended) linear relationship between the actual volume and the variance in the target volume – this kind of looks more like the inverse of the mean to variance relationship in a Poisson distribution, and for a gaussian distribution you’d have to explicitly model that. (I don’t think I see that in your model, instead you model all predictor values on x as parameters, which may be a bit too much).

On the volume, it seems like you could introduce that relationship explicitly. Going back to the simple example, you would replace real<lower=0> sigma with something like

parameters {
real<lower=0> tau;
real slopetau;
real interceptau;
vector[N] sigma; 
// ...
}
model {
sigma = g(tau, slope, intercept) //where  g is a simple linear function
// ...
}

I’m not entirely sure how to formulate the relationship of the number of transfers with the fill volume, but if it’s just increasing the variance but essentially not affecting the mean of the volume maybe it will just be captured by the error in the likelihood distribution.

I think using the QR decomposition is a fairly standard implementation, and it will be computed pretty frequently, but many of these computations are negligible compared to the cost of computing the gradients (it is often easier to just run them and see how they perform than figuring it out from first principles).
Still, I think the more involved part is to implement the dependency of the variance on transfer volume and number of transfers, so it may be worth doing that on a standard linear regression, seeing how it performs, and if needed or desired, implementing a QR-decomposition version.

kaskogsholm · July 11, 2025, 9:40pm

Yes, exactly.

I was trying not to burden people with the gory details of the volume error model, since it ends up being a whole lot of code, but it’s already done and working for a different problem. The wells are represented by the vertices of a DAG, with transfers represented by the weights of the edges. Every volume transfer is a parameter, sampled from the distribution I mentioned with appropriate parameters, and then the predictor matrix is calculated from the DAG and the “actual” unmeasured volumes in the transformed parameter block.

Yes, that seems wise. I will report back after doing that, as long as I’m not fundamentally misunderstanding something in my combination of the two examples in the user guide.

caesoma · July 12, 2025, 9:48pm

Without all the details, I’d say the difference between what you say you need to do, which is implementing the (linear plus number of transfers, etc) model of dependency of the variance, and what you are actually doing, which is estimating all the predictors is estimating a lot more (N \times K) parameters in the latter (although it could be made more efficient by introducing the dependencies via priors).

I’d say the former should work better, but it’s both a matter of choice and of trying out what makes more sense to you and see it if works.

Topic		Replies	Views
Aggregated predictor in a simple linear regression Modeling	1	301	November 4, 2021
Intermediate calculations of predictor variables with measurement error General example-models	5	1187	September 18, 2017
Out of sample predictions with QR decomposition? Modeling	5	1018	December 13, 2018
Understanding measurement error models w fake data (when the error is quadratically related to the measure) Modeling cognitive-science , brms	9	745	May 28, 2021
How to model this measurement error problem best? Modeling techniques , specification	9	793	September 20, 2021

Linear Regression with QR decomposition AND error in predictors?

Related topics