I am having a problem with the equivalence of parameters learned with my Stan model.
In the linear regression model shown below, I am using a bspline design matrix, along with the QR reparameterization for computational efficiency. In this scenario, I am expecting the following relationships between the parameters to be approximately true:
w = R_inverse * eta
theta = R_inverse * zeta
or equivalently:
eta = R * w
zeta = R * theta
However, if I compare the learned parameters eta and zeta, with the computed values of eta and zeta based on the learned values of w and theta (and vice versa), they are not equivalent - see the attached figures. The learned w and theta values and the calculated versions of eta and zeta seem to be correct, as these result in predictions as I would expect them via the beta distribution (cannot show here for privacy). There is something wrong with with the posterior eta and zeta values however.
Any ideas what it is going wrong here? I don’t understand how this can happen since the parameters are explicitly linked to each other in the model. Perhaps there is something unexpected going on in the QR decomposition.
data {
int<lower=0> N; // number of data
vector<lower=0,upper=1>[N] y; // output powers
int<lower=0> num_t; // number of knots
int<lower=0> order; // order
int<lower=1> B; // design matrix width
matrix[N, B] X; // design matrix
}
transformed data {
matrix[N, B] Q;
matrix[B, B] R;
matrix[B, B] R_inverse;
// thin and scale the QR decomposition
Q = qr_thin_Q(X) * sqrt(N - 1);
R = qr_thin_R(X) / sqrt(N - 1);
// add small number to diagonal to ensure R is invertible (non-singular with no zeros on the diagonal)
R = add_diag(R, 1e-4);
R_inverse = inverse(R);
}
parameters {
vector[B] eta; // coefficients on Q (for mu)
vector[B] zeta; // coefficients on Q (for phi)
}
transformed parameters {
vector<lower=0,upper=1>[N] mu; // Mean of the Beta distribution
vector<lower=0>[N] phi; // dispersion parameter for the Beta distribution
vector<lower=0>[N] a; // parameter for beta distribution
vector<lower=0>[N] b;
mu = inv_logit(Q * eta); // -> ensures that this results in values between 0 and 1
phi = exp(Q * zeta); // -> ensures positive phi / could multiply the precision here...
for (i in 1:N) {
a[i] = mu[i] * phi[i];
b[i] = (1 - mu[i]) * phi[i];
}
}
model {
// priors
a ~ exponential(1e-4);
b ~ exponential(1e-4);
for (n in 1:B){
eta[n] ~ normal(0,3);
zeta[n] ~ normal(0,3);
}
// likelihood
y ~ beta(a, b);
}
generated quantities {
vector[B] w;
vector[B] theta;
//vector[N] log_lik;
w = R_inverse * eta; // coefficients for mean mu
theta = R_inverse * zeta; // coefficients for dispersion phi