Interpretation of multilevel parameters

The following is not so much a modeling problem, the brms procedure works very well as such, it is about interpretation of results. This is what I study: the concentration of a chemical © as a function of time (t) is modeled by its initial concentration (c0), a rate constant ( k) and an order ( n) , according to the following equation:

c=\left(c_{0}^{1-n}+(n-1) k t\right)^{\frac{1}{1-n}}

I have a dataset available with repetitions of the experiment studying exactly the same reaction; concentration is measured as a function of time. Even though the conditions are the same, there is variation in the results, obviously and, consequently I want to know how the parameters vary between the repetitions. The data within one repetition are correlated because samples are taken from the same reactor as a function of time. The idea was therefore to apply multilevel modeling to investigate how variable the three parameters are over the various repetitions. The additional benefit of multilevel modeling is that it takes the experimental correlation within each repetition into account. Multilevel modeling using brms does give seemingly very reasonable output, showing nicely how the three parameters vary per repetition, part of the code I used for the model in brms is:

nlform<-bf(conc ~ (c0^(1-n)+(n-1)*k*time)^(1/(1-n)), 
           c0 ~1+(1|ID|serial),
           k ~ 1+(1|ID|serial), 

The output is as such very nice and instructive but the problem I face is that the rate constant parameter (k) has dimensional units that contains the order parameter (n) in it:

(\frac {L} {mol})^{n-1} s^{-1}

This dimensional construction is needed in order to arrive at the right units for the concentration c (should be in mol/L) as a function of time (the order n is dimensionless). So, as soon as parameter n changes numerically (which it does because of the varying effect), the unit of k changes with it, and next to that, k itself is also subject to a varying effect. This makes it not well possible to compare the varying values of parameter k, it seems.
The multilevel modeling results indicate that k and n are correlated but not that strong ($\rho$=-0.07, SE=0.32): see attached figure made from brms output with GGally .

My question is: how can I interpret the varying k-values ? Of course, I also get an overall population value for k and n. Those can be used in conjunction without problem because they are “fixed” by definition. But I am struggling with the varying effects: can I compare them and how?

Hi, sorry for not getting to you earlier, your question is relevant and well written.

I am not 100% convinced that the dimensioality of k is an issue here. The units are what they are, so you report the values in those units. But I think it is then important to also account for the correlation as it might be important that high k are more likely with high n, so univariate posteriors might not be so useful.

Generally, in such complex cases, I find it useful to think in terms of a prediction task you are interested in. Because very often (not always) the scientific question you have can be interpreted as a prediction task. Fitting a gaussian linear model like x ~ group? The posterior of the group coefficient can be seen as a prediction of how big difference in mean between groups you would see in a future experimetn large enough that observation noise is negligible.

So the question is then to find the right prediction task, looking at your setup, those may include:

  • What is the distribution of parameters?
  • What is the distribution of the reaction curves? (e.g. by plotting)
  • What is the distribution of concentration at time X?
  • What is the distribution of the time needed to reach concentration Y?
  • What is the probability the concentration is increasing/decreasing from the initial concentration?

All of the above then could be asked:

  • For a hypothetical future serial drawn from the same “population” as the observed serials. (i.e. include the varying intercept via a new level and “sample_new_levels = ‘uncertainty’”)
  • For the “true” or “average” underlying system (i.e. ignore the varying intercept)
  • In the experiments you actually observed (i.e. include the fitted varying intercepts for your experiments)

But you could also ask other stuff, like:

  • What is the expected difference in some of the constants (or anything else) between two future experiments?

All of those (and more) should be answerable using the posterior of the model. But you still need to figure out which questions do you actually want to ask, as there is a lot of options…

Does that make sense?

Best of luck with your model!

1 Like

Hi Martin, your answer is much appreciated! For a moment I thought that I was addressing a non-issue since there was no reaction initially. I am very happy with your answer and it does make sense! My biggest problem will now be to convince my peers when I am going to discuss this😃. But you are right in assuming that the ultimate goal of this exercise is to predict. So, are you basically stating that I can “forget” this dimensionality property in the first instance as a sort of analysis phase to see how parameters behave, and come back to the right dimensionality once it is decided how to use the results?

1 Like

I am not sure I follow your thought here, but maybe that’s just because I would have worded it differently? I definitely don’t think “forget” is the right word. It is good to be aware of what your parameters mean. My point was more like: “OK, so we have interdependent parameters, so maybe focus less on each parameter separately and rather look what they all together imply about the world”. Kind of like if you modelled speed and capacity of a vehicle, but were really interested in how long does it take to transport a pile of stuff - looking at each parameter separately tells you something, but it is really hard to interpret without the other parameter. The model as a whole however contains enough information to answer your question.

An alternative approach would be to try to find a different parametrization of the model where the parameters are interpretable separately, but that might be hard. And in the end, you will not get any new information, just a different rephrasing, so if you are not having any problems with fitting, I think it is unlikely to be very helpful.

Also, if this is the parametrization of the process used by many in the field, than maybe poeple would expect you to report as (\frac {L} {mol})^{n-1} s^{-1}, because that’s what everybody has been doing (although possibly with fixed n)?

Does that make sense?

Can you not just recast the model (with modified parameters) as

\frac{c(t)}{c_0}= \left(1+\left(\frac{\tilde{k}}{c_0}\right)^{1-n}\frac{t}{s}\right)^{\frac{1}{1-n}}?

At least then \tilde{k} always has units L/mol. The correlation may of course change.

1 Like

Hi Martin, yes you are right, the word “forget” was a bit of unfortunate word choice, I was trying to interpret your remark: “I am not 100% convinced that the dimensionality of k is an issue here” and I meant to say that I pay perhaps too much attention to the dimension of this one parameter. I think that the reasoning to focus not on just one parameter but on the outcomes of the whole model makes a lot of sense indeed, so thank you for thinking with me. This parameterization is indeed what people expect and thinking in terms of a changing parameter n is not done yet.
@Funko_Unko: thanks for your suggestion, I am going to try it!

1 Like