Hi all,
I’m obviously going isolation mad because I’m picking up all of my projects that I’ve been swearing I’d get around to when I’ve got uninterrupted time. So apologies for the sudden content surge!
I’ve been looking at the generated code for the offset_muliplier_constrain
and I’ve got a couple of questions about what is possible if we’re going to be able to extend this to multivariate settings. I found a dormant branch @Matthijs wrote that has the code for doing this (PR, branch). There’s no discussion on why it’s dormant, but i think I can see the problem, so I’ll outline it here.
Background
The offset_multiplier_constrain
funciton is the backbone of one of our fancy bijectors, which allows us to write non-centred code like this
// irrelevant code
parameters {
real mu;
real sigma;
vector<offset = mu, multiplier = sigma>[10] a;
// other code
}
model {
// irrelevant code
a ~ normal(mu, sigma);
}
This is obviously lovely because it just involves a “type change” to go between centred and non-centred parameterizations. So cool! (Weird thing: vector<multiplier = sigma, offset = mu>
throws an error, even though there’s no logical order for those named parameters)
Looking at the generated code, a (simplified) pseudocode version of what is generated is
// bunch of stuff including defs for mu and sigma
Matrix<scalar> a = a_unconstr; // a_unconstr is the unconstrained value
a = offset_multiplier_constrain(a, mu, sigma, lp__); // lp__ accumulates jacobian
// bunch of stuff
lp__ += normal_log<propto>(a, mu, sigma);
//more stuff
Mathematically, this is roughly the same as computing
lp__ += -size(a)*log(sigma) - 0.5*( (sigma*a_unconstr + mu) - mu)^2/sigma^2;
For the scalar case, the redundant calculation doesn’t really matter.
What would happen when we’re multivariate?
Well, nothing right now. But if we use the implementation on Matthijs’ branch, we would be able to write something like
// irrelevant code
parameters {
vector[10] mu;
matrix[10,10] sigma;
vector<offset = mu, chol = cholesky_decompose(sigma)>[10] a;
// other code
}
model {
// irrelevant code
a ~ multi_normal(mu, sigma);
}
Note the slightly cool thing that the parameters to the bijector can be functions!
Using the same type of code geneation would be bad for two reasons:
- We would need a second, unnecessary Cholesky decomposition inside
multi_normal
. Because of the way the blocks currently work, we can’t save the computed Cholesky for reuse inmulti_normal_cholesky
. - If we ignore
mu
, the generated code will be computing the equivalent of a^TL^T\Sigma^{-1}La, where \Sigma = LL^T. Although this is mathematically equivalent to a^Ta it will probably not be in floating point.
So we’ve got an expensive operation (a cholesky, a solve, and two matrix-vector mults) that is not floating point safe.
Questions
Is there a way to generate the code so it doesn’t do this? Basically, it would recognize that
-
a
uses an offset-multiplier bijector -
a
is the “target” (lvalue? I’m bad on lingo) of amulti_normal(m, sigma)
ormulti_student_t(nu, m, sigma)
distribution with the samesigma
(m
doesnt’ need to equalmu
).
In this case the generated code for a slightly different case (to take into account that the offset may not equal the man of the multivarite normal!). There’s also slightly different syntax in the bijector to explicitly give the covariance matrix rather than the cholesky. I think this is fairly clean syntax for the multivariate non-centring.
// irrelevant code (possibly defining data mu)
parameters {
// irrelevant code (possibly defining parameter mu)
vector[10] m;
matrix[10,10] sigma;
vector<offset = mu, covariance = sigma>[10] a;
// other code
}
model {
// irrelevant code
a ~ multi_normal(m, sigma);
}
would be more like
// bunch of stuff including defs for mu and sigma, m
Matrix<scalar> a = a_unconstr; // a_unconstr is the unconstrained value
lp__ += normal_log<propto>(a, m - mu, 1.0); //
// bunch of stuff
a = offset_multiplier_constrain(a, mu, sigma, lp__); // lp__ accumulates jacobian
//more stuff
Note that I had to re-sort the expression graph! This may not be possible! The generated code with the transformed parameters
-based non-centring actually stores the unconstrained values separately, so it can keep the original order.
If something like this would be possible, it’d be great because it would mean we could use the same code for centring and non-centring GPs and multi_normals that we use for the scalar case!