Parameter inequivalence in linear regression QR decomposition model

Simon_Smith · August 6, 2024, 3:17pm

I am having a problem with the equivalence of parameters learned with my Stan model.

In the linear regression model shown below, I am using a bspline design matrix, along with the QR reparameterization for computational efficiency. In this scenario, I am expecting the following relationships between the parameters to be approximately true:

w = R_inverse * eta
theta = R_inverse * zeta

or equivalently:

eta = R * w
zeta = R * theta

However, if I compare the learned parameters eta and zeta, with the computed values of eta and zeta based on the learned values of w and theta (and vice versa), they are not equivalent - see the attached figures. The learned w and theta values and the calculated versions of eta and zeta seem to be correct, as these result in predictions as I would expect them via the beta distribution (cannot show here for privacy). There is something wrong with with the posterior eta and zeta values however.

Any ideas what it is going wrong here? I don’t understand how this can happen since the parameters are explicitly linked to each other in the model. Perhaps there is something unexpected going on in the QR decomposition.

eta
theta

zeta

data { 
    int<lower=0> N; // number of data
    vector<lower=0,upper=1>[N] y; // output powers
    int<lower=0> num_t; // number of knots
    int<lower=0> order; // order
    int<lower=1> B; // design matrix width
    matrix[N, B] X; // design matrix
} 

transformed data {
    matrix[N, B] Q;
    matrix[B, B] R;
    matrix[B, B] R_inverse;

    // thin and scale the QR decomposition
    Q = qr_thin_Q(X) * sqrt(N - 1);
    R = qr_thin_R(X) / sqrt(N - 1);
    // add small number to diagonal to ensure R is invertible (non-singular with no zeros on the diagonal)
    R = add_diag(R, 1e-4);
    R_inverse = inverse(R);
}


parameters {
    vector[B] eta; // coefficients on Q (for mu)
    vector[B] zeta; // coefficients on Q (for phi)
} 

transformed parameters {
    vector<lower=0,upper=1>[N] mu; // Mean of the Beta distribution
    vector<lower=0>[N] phi; // dispersion parameter for the Beta distribution
    vector<lower=0>[N] a; // parameter for beta distribution
    vector<lower=0>[N] b; 

    mu = inv_logit(Q * eta); // -> ensures that this results in values between 0 and 1
    phi = exp(Q * zeta); // -> ensures positive phi / could multiply the precision here...

    for (i in 1:N) {
        a[i] = mu[i] * phi[i];
        b[i] = (1 - mu[i]) * phi[i];
    }
}

model {
//  priors
  a ~ exponential(1e-4);
  b ~ exponential(1e-4);

  for (n in 1:B){
    eta[n] ~ normal(0,3);
    zeta[n] ~ normal(0,3);
  }

  // likelihood
  y ~ beta(a, b);
}

generated quantities {
  vector[B] w;
  vector[B] theta;
  //vector[N] log_lik;

  w =  R_inverse * eta; // coefficients for mean mu
  theta =  R_inverse * zeta; // coefficients for dispersion phi

Simon_Smith · August 8, 2024, 1:33pm

I figured I can output the Q and R matrices in Stan, and there are differences compared to what I think is the equivalent numpy method - i.e.

Q, R = np.linalg.qr(dmat, mode="reduced")
Q = Q * (np.sqrt(len(X) - 1))
R = R / (np.sqrt(len(X) - 1))
np.fill_diagonal(R_new, np.diag(R_new) + 1e-4)

An example Stan R matrix for a given design matrix X looks like this:

Whilst with numpy it looks like this:

I also see differences with the Q matrices, but the inverse of the R matrices look approximately equal…

Simon_Smith · August 14, 2024, 11:24am

I have solved this problem now.

It turns out, the QR decomposition in Stan is slightly different to the one in numpy/scipy, although both are valid.

Topic		Replies	Views
QR decomposition and R2D2 Modeling	9	264	July 1, 2024
Trouble setting up QR General stanc	2	797	August 23, 2017
QR decomposition question Modeling	1	320	December 30, 2020
Ridge prior with QR decomposition Modeling	4	954	January 8, 2019
QR decomposition error Modeling fitting-issues	3	965	September 3, 2017

Parameter inequivalence in linear regression QR decomposition model

Related topics