Parameter inequivalence in linear regression QR decomposition model

I am having a problem with the equivalence of parameters learned with my Stan model.

In the linear regression model shown below, I am using a bspline design matrix, along with the QR reparameterization for computational efficiency. In this scenario, I am expecting the following relationships between the parameters to be approximately true:

w = R_inverse * eta
theta = R_inverse * zeta

or equivalently:

eta = R * w
zeta = R * theta

However, if I compare the learned parameters eta and zeta, with the computed values of eta and zeta based on the learned values of w and theta (and vice versa), they are not equivalent - see the attached figures. The learned w and theta values and the calculated versions of eta and zeta seem to be correct, as these result in predictions as I would expect them via the beta distribution (cannot show here for privacy). There is something wrong with with the posterior eta and zeta values however.

Any ideas what it is going wrong here? I don’t understand how this can happen since the parameters are explicitly linked to each other in the model. Perhaps there is something unexpected going on in the QR decomposition.

eta
theta
w
zeta

data { 
    int<lower=0> N; // number of data
    vector<lower=0,upper=1>[N] y; // output powers
    int<lower=0> num_t; // number of knots
    int<lower=0> order; // order
    int<lower=1> B; // design matrix width
    matrix[N, B] X; // design matrix
} 

transformed data {
    matrix[N, B] Q;
    matrix[B, B] R;
    matrix[B, B] R_inverse;

    // thin and scale the QR decomposition
    Q = qr_thin_Q(X) * sqrt(N - 1);
    R = qr_thin_R(X) / sqrt(N - 1);
    // add small number to diagonal to ensure R is invertible (non-singular with no zeros on the diagonal)
    R = add_diag(R, 1e-4);
    R_inverse = inverse(R);
}


parameters {
    vector[B] eta; // coefficients on Q (for mu)
    vector[B] zeta; // coefficients on Q (for phi)
} 

transformed parameters {
    vector<lower=0,upper=1>[N] mu; // Mean of the Beta distribution
    vector<lower=0>[N] phi; // dispersion parameter for the Beta distribution
    vector<lower=0>[N] a; // parameter for beta distribution
    vector<lower=0>[N] b; 

    mu = inv_logit(Q * eta); // -> ensures that this results in values between 0 and 1
    phi = exp(Q * zeta); // -> ensures positive phi / could multiply the precision here...

    for (i in 1:N) {
        a[i] = mu[i] * phi[i];
        b[i] = (1 - mu[i]) * phi[i];
    }
}

model {
//  priors
  a ~ exponential(1e-4);
  b ~ exponential(1e-4);

  for (n in 1:B){
    eta[n] ~ normal(0,3);
    zeta[n] ~ normal(0,3);
  }

  // likelihood
  y ~ beta(a, b);
}

generated quantities {
  vector[B] w;
  vector[B] theta;
  //vector[N] log_lik;

  w =  R_inverse * eta; // coefficients for mean mu
  theta =  R_inverse * zeta; // coefficients for dispersion phi
 

I figured I can output the Q and R matrices in Stan, and there are differences compared to what I think is the equivalent numpy method - i.e.

Q, R = np.linalg.qr(dmat, mode="reduced")
Q = Q * (np.sqrt(len(X) - 1))
R = R / (np.sqrt(len(X) - 1))
np.fill_diagonal(R_new, np.diag(R_new) + 1e-4)

An example Stan R matrix for a given design matrix X looks like this:
image
Whilst with numpy it looks like this:
image
I also see differences with the Q matrices, but the inverse of the R matrices look approximately equal…
image
image

1 Like

I have solved this problem now.

It turns out, the QR decomposition in Stan is slightly different to the one in numpy/scipy, although both are valid.

1 Like