Additive degeneracy issue

I have a problem that I can’t solve with my very basic linear algebra skills - posting here in the hope there is someone more informed out there!

I am trying to model some measurements of N_{rxn} chemical reactions that depend deterministically on properties of their N_{cpd} reactants through the relationship y = S^{T}\theta, where

  • S is an N_{cpd} by N_{rxn} stoichiometric matrix specifying the proportions in which the reactions create and destroy reactants
  • \theta is a length N_{rxn} vector of unknown reactant properties

(the measurements are of gibbs energy changes of reaction and the reactant properties are their gibbs energy changes of formation).

There is prior information about \theta and I’m assuming that the measurement error is known in advance.

Here is a naive representation of the model I’d like to fit in the form of a Stan program:

data {
  int<lower=1> N_rxn;
  int<lower=1> N_cpd;
  matrix[N_cpd, N_rxn] S;
  vector[N_rxn] y;
  vector<lower=0>[N_rxn] error_scale;
  vector[N_cpd] prior_loc_theta;
  vector<lower=0>[N_cpd] prior_scale_theta;
}
parameters {
  vector[N_cpd] theta;
}
model {
  target += normal_lpdf(theta | prior_loc_theta, prior_scale_theta);
  target += normal_lpdf(y | S' * theta, error_scale);
}
generated quantities {
  vector[N_rxn] yrep;
  {
    vector[N_rxn] yhat = S' * theta;
    for (n in 1:N_rxn){
      yrep[n] = normal_rng(yhat[n], error_scale[n]);
    }
  }
}

This model doesn’t work very well in cases where the measured reactions don’t identify all the compound properties, which is typical. For example, a stoichiometric matrix might look like this:

[[0, 0, 0, -1],
 [0, 0, 0, -1],
 [0, 0, 0, 2],
 [0, 0, -1, 0],
 [0, -1, -1, 0],
 [1, 1, 2, 0]]

In this setup reaction 1 (the leftmost column) creates compound 6 with no compound being destroyed. Reaction 2 creates compound 6 and destroys compound 5, etc.

The problem is that measuring these reactions only gives information about the absolute \theta values of compounds 4 to 6. For \theta_1, ...\theta_3 the measurements only give information about their relative values. This can be seen by looking at the reduced row echelon form of S^T:

[[1, 1, -2, 0, 0, 0],
 [0, 0, 0, 1, 0, 0],
 [0, 0, 0, 0, 1, 0],
 [0, 0, 0, 0, 0, 1]]

From this matrix we can see that the measurement y_1 depends on \theta_1 + \theta_2 - 2\theta_3. I think this is a case of what Michael Betancourt @betanalpha calls additive degeneracy.

I tried using the reduced row echelon form of S^T to make the following alternative Stan model which seems to give the same posterior predictive distribution as the one above, but uses fewer leapfrog steps:

data {
  int<lower=1> N_rxn;
  int<lower=1> N_cpd;
  matrix[N_cpd, N_rxn] S;
  matrix[N_rxn, N_cpd] S_T_rref;
  vector[N_rxn] y;
  vector<lower=0>[N_rxn] error_scale;
  vector[N_cpd] prior_loc_theta;
  vector<lower=0>[N_cpd] prior_scale_theta;
}
transformed data {
  vector[N_rxn] prior_loc_theta_free = S_T_rref * prior_loc_theta;
  vector[N_rxn] prior_scale_theta_free = fabs(S_T_rref) * prior_scale_theta;
}
parameters {
  vector[N_rxn] theta_free;
}
model {
  vector[N_rxn] yhat = S' * (theta_free' * S_T_rref)';
  target += normal_lpdf(theta_free | prior_loc_theta_free, prior_scale_theta_free);
  target += normal_lpdf(y | yhat, error_scale);
}
generated quantities {
  vector[N_rxn] yrep;
  {
    vector[N_rxn] yhat = S' * (theta_free' * S_T_rref)';
    for (n in 1:N_rxn){
      yrep[n] = normal_rng(yhat[n], error_scale[n]);
    }
  }
}

I’ve made a quick python script that runs both models with some representative data here.

My questions are:

  • Does what I’ve done so far make sense? Are my models really equivalent mathematically/informationally?

  • If not, is there a standard way of dealing with this kind of problem?

  • If so, how can I recover the vector \theta? I think the required information is there but I’m confused about how to combine the relative information in theta_free with the pre-experimental information about the absolute theta parameters.

I don’t have time to carefully look at the linear algebra but the basic idea is correct. See also https://betanalpha.github.io/assets/case_studies/qr_regression.html for a demonstration of a similar problem, including the translation of a prior density in the nominal parameterization to the reduced one.

Ultimately it helps to recognize that you’re reparameterizing the entire model, not just the one likelihood function, so you need to propagate the change everywhere including through the prior density with the requisite Jacobian and what not.

The reduced form works out parameters that are more directly informed by the available data that should facilitate the fit, at least provided that the transformed prior model isn’t awkward. Otherwise the main options are a more informative prior model – even more information on just a few parameters can help fight the degeneracy – or incorporating complementary measurements that identify how some of the reactions work on their own.

2 Likes

Thanks very much for the helpful comments and link! I think I’m now pretty clear about the right general approach and have got an encouraging result in my simple simulation study. I’m still not 100% sure so I thought I’d write out everything I did.

I defined a reparameterisation with a matrix R which is the N_{cpd} \times N_{cpd} identity matrix but with rows from the reduced row echelon form of S^T where rref(S^T) has leading ones. In this case the matrix is

R = \left[\begin{matrix}1 & 1 & -2 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0\\0 & 0 & 1 & 0 & 0 & 0\\0 & 0 & 0 & 1 & 0 & 0\\0 & 0 & 0 & 0 & 1 & 0\\0 & 0 & 0 & 0 & 0 & 1\end{matrix}\right]

with rows 1, 4, 5 and 6 taken from rref(S^T) because its leading ones are at these columns. The reparameterisation is

\gamma = R\theta

and theta can be recovered with

\theta = R^{-1}\gamma

I made the following Stan model to test the reparameterisation:

data {
  int<lower=1> N_rxn;
  int<lower=1> N_cpd;
  matrix[N_cpd, N_rxn] S;
  matrix[N_cpd, N_cpd] R;  // matrix defining a reparameterisation
  vector[N_rxn] y;
  vector<lower=0>[N_rxn] error_scale;
  vector[N_cpd] prior_loc_theta;
  vector<lower=0>[N_cpd] prior_scale_theta;
}
parameters {
  vector[N_cpd] gamma;
}
transformed parameters {
  vector[N_cpd] theta = R \ gamma;
}
model {
  target += normal_lpdf(theta | prior_loc_theta, prior_scale_theta);
  // no jacobian as gamma -> theta is a linear transformation 
  target += normal_lpdf(y | S' * theta, error_scale);
}
generated quantities {
  vector[N_rxn] yrep;
  vector[N_rxn] log_lik;
  {
    vector[N_rxn] yhat = S' * theta;
    for (n in 1:N_rxn){
      yrep[n] = normal_rng(yhat[n], error_scale[n]);
      log_lik[n] = normal_lpdf(y[n] | yhat[n], error_scale[n]);
    }
  }                             
}

It seems to work ok in the test case I put in. From this graph it looks like the marginal posteriors are about the same with the new parameterisation as the naive one:

This graph shows that Stan took much fewer leapfrog steps when sampling from the reparameterised model:

I still need to try the method out in a more complicated case and I also want to include more information about the compounds (two compounds with similar chemical strucutures likely have similar formation energy) but for now I’m very happy - thanks again!

2 Likes