I have a problem that I can’t solve with my very basic linear algebra skills - posting here in the hope there is someone more informed out there!
I am trying to model some measurements of N_{rxn} chemical reactions that depend deterministically on properties of their N_{cpd} reactants through the relationship y = S^{T}\theta, where
- S is an N_{cpd} by N_{rxn} stoichiometric matrix specifying the proportions in which the reactions create and destroy reactants
- \theta is a length N_{rxn} vector of unknown reactant properties
(the measurements are of gibbs energy changes of reaction and the reactant properties are their gibbs energy changes of formation).
There is prior information about \theta and I’m assuming that the measurement error is known in advance.
Here is a naive representation of the model I’d like to fit in the form of a Stan program:
data {
int<lower=1> N_rxn;
int<lower=1> N_cpd;
matrix[N_cpd, N_rxn] S;
vector[N_rxn] y;
vector<lower=0>[N_rxn] error_scale;
vector[N_cpd] prior_loc_theta;
vector<lower=0>[N_cpd] prior_scale_theta;
}
parameters {
vector[N_cpd] theta;
}
model {
target += normal_lpdf(theta | prior_loc_theta, prior_scale_theta);
target += normal_lpdf(y | S' * theta, error_scale);
}
generated quantities {
vector[N_rxn] yrep;
{
vector[N_rxn] yhat = S' * theta;
for (n in 1:N_rxn){
yrep[n] = normal_rng(yhat[n], error_scale[n]);
}
}
}
This model doesn’t work very well in cases where the measured reactions don’t identify all the compound properties, which is typical. For example, a stoichiometric matrix might look like this:
[[0, 0, 0, -1],
[0, 0, 0, -1],
[0, 0, 0, 2],
[0, 0, -1, 0],
[0, -1, -1, 0],
[1, 1, 2, 0]]
In this setup reaction 1 (the leftmost column) creates compound 6 with no compound being destroyed. Reaction 2 creates compound 6 and destroys compound 5, etc.
The problem is that measuring these reactions only gives information about the absolute \theta values of compounds 4 to 6. For \theta_1, ...\theta_3 the measurements only give information about their relative values. This can be seen by looking at the reduced row echelon form of S^T:
[[1, 1, -2, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]]
From this matrix we can see that the measurement y_1 depends on \theta_1 + \theta_2 - 2\theta_3. I think this is a case of what Michael Betancourt @betanalpha calls additive degeneracy.
I tried using the reduced row echelon form of S^T to make the following alternative Stan model which seems to give the same posterior predictive distribution as the one above, but uses fewer leapfrog steps:
data {
int<lower=1> N_rxn;
int<lower=1> N_cpd;
matrix[N_cpd, N_rxn] S;
matrix[N_rxn, N_cpd] S_T_rref;
vector[N_rxn] y;
vector<lower=0>[N_rxn] error_scale;
vector[N_cpd] prior_loc_theta;
vector<lower=0>[N_cpd] prior_scale_theta;
}
transformed data {
vector[N_rxn] prior_loc_theta_free = S_T_rref * prior_loc_theta;
vector[N_rxn] prior_scale_theta_free = fabs(S_T_rref) * prior_scale_theta;
}
parameters {
vector[N_rxn] theta_free;
}
model {
vector[N_rxn] yhat = S' * (theta_free' * S_T_rref)';
target += normal_lpdf(theta_free | prior_loc_theta_free, prior_scale_theta_free);
target += normal_lpdf(y | yhat, error_scale);
}
generated quantities {
vector[N_rxn] yrep;
{
vector[N_rxn] yhat = S' * (theta_free' * S_T_rref)';
for (n in 1:N_rxn){
yrep[n] = normal_rng(yhat[n], error_scale[n]);
}
}
}
I’ve made a quick python script that runs both models with some representative data here.
My questions are:
-
Does what I’ve done so far make sense? Are my models really equivalent mathematically/informationally?
-
If not, is there a standard way of dealing with this kind of problem?
-
If so, how can I recover the vector \theta? I think the required information is there but I’m confused about how to combine the relative information in
theta_free
with the pre-experimental information about the absolutetheta
parameters.