## Summary

Motivated to contribute to scientific dissemination, I wrote this post to make it easier for attentive readers to understand how incomplete or limited data can affect non-linear growth models. In this post, we explore solutions to fix these problems (using the brms package: Bayesian Regression Models using ‘Stan’). Let’s apply it to growth data on shrimp produced in aquaculture if you allow me. We will explore the following topics:

- Nonlinear growth models
- Challenges posed by incomplete or limited data
- Bias correction in estimating parameters of nonlinear models
- Bayesian hierarchical modeling (using the STAN tool)
- Conclusion

The insights in this piece are drawn from two key scientific articles:

- “Evidence of parameters underestimation from nonlinear growth models for data classified as limited”
- “Modeling the growth of Pacific white shrimp (Litopenaeus vannamei) using the new Bayesian hierarchical approach based on correcting bias caused by incomplete or limited data”

**Nonlinear Growth Models**

Nonlinear models represent a family of regression models where parameters are strongly correlated, posing implications for parameter inference. These correlations often necessitate specific estimation methods. Statisticians usually classify it as a non-linear model in which the derivatives of the mean function with respect to the parameters depend on one or more of the model’s parameters. What sets nonlinear models apart, is the interpretability of their parameters, which often carry biologically meaningful implications. Below, we present a selection of nonlinear growth models commonly found in applied statistics literature.

Function name |
Mathematical expression |
---|---|

Morgan-Mercer-Flodin (MMF) | f(t)= \alpha- \dfrac{\alpha-w_0}{1+(\kappa \cdot 10^{-4} \cdot \;t)^{\delta}} |

Michaelis-Menten Generalized | f(t) = \dfrac{w_0 \; \beta^\kappa+\alpha \; t^{\kappa}}{\beta^\kappa+t^{\kappa}} |

Weibull growth | f(t)= \alpha\;(1-exp(-\beta \cdot 10^{-4} \cdot \; t^{\kappa} ))+w_0 |

von Bertalanffy | f(t)= \alpha \;(1-exp(-\kappa \cdot 10^{-4} \cdot \;(t+\beta)))^3 |

Gompertz function | f(t)= \alpha \; exp(-exp(\kappa \; (\beta-t))) |

Logistic growth function | f(t)= \dfrac{\alpha}{1+exp(\kappa \; (\beta-t))} |

Richards | f(t) =\alpha \left[ 1+(\gamma-1)e^{-\kappa(t-\beta)} \right]^{\frac{1}{1-\gamma} } |

It’s important to note that parameters in nonlinear models may not always share the same biological interpretation. Care must be taken when generalizing interpretations across different growth models (I mean, the same Greek letter as a mathematical symbol representing the parameter between models, will not always have the same biological interpretation of the phenomenon studied). In some applications, comparing different parameterizations of the same model is common practice, leading to nuanced differences in parameter biological meanings. To familiarize readers with nonlinear growth models, we’ve prepared a graphical GeoGebra (all models together) analysis to explore various models and their parameters (click the hyperlink on each model in the table and enjoy).

**Challenges Posed by Incomplete or Limited Data**

A peculiar characteristic that aquaculture data (cultivation of aquatic organisms) has is the fact that it is incomplete or limited. This means that samples (weight over time) are predominantly observed below the inflection point of growth curves. A common pattern in aquaculture production environments, especially in shrimp farming, which is often not worth it for the producer to keep these individuals on the farms for a long time, this implies increasing expenses with feed, physical space, fixed and variable costs. This way, when the animals reach commercial weight, the tank is fished and sold to the market and a new production cycle begins.

Such incomplete or limited data introduce biases in estimating parameters of nonlinear growth models. Mainly one of the most difficult parameters to estimate, the alpha parameter (theoretical asymptotic weight). I don’t know if you noticed, but it is the only common parameter among all non-linear growth models. It is a very important parameter even for estimating other parameters of the model itself. It is theoretical because it represents the individual’s weight at the end of their life (when time has its limit at infinity), which is abstract and impossible to obtain in practice. We know from estimates that it is often peculiar to the species that is studied as a target phenomenon in modeling (different between a fish and an elephant for example). Obviously, with the Bayesian perspective, this parameter will not be fixed, and there is variation from individual to individual.

In our simulation below, we showcase data limitations in Pacific white shrimp (*Litopenaeus vannamei*) growth over various weeks of cultivation, alongside fits of different nonlinear growth curves. Note that between 7 and 18 weeks there are always underestimations of the parameters and consequently underestimations of the non-linear curves of the proposed models. As the number of observations increases, the fit of the models also improves.

**Bias Correction in Parameter Estimation**

To correct the parameters’ underestimation in non-linear growth models, researchers proposed a Hierarchical Bayesian Growth Model (STAN language with *brms* interface R package) that takes into account information from wild animals (from fishing data) as a priori information. Although the farming environment is completely different from the wild environment, information from the same species sourced from fishing data aids in more accurate parameter estimation for aquaculture growth models. By incorporating these insights into the prior distribution of the Bayesian model, we correct potential parameter underestimations caused by incomplete or limited data.

Below I provide an R script with Bayesian Hierarchical Model (*package::brms*) adjustment for growth data for Pacific white shrimp.The limited/incomplete data were simulated using the von Bertalanffy equation. They represent the weight of the shrimp over the time of a production cycle in a pond (tank) on a farm in northeastern Brazil. Data were limited to weights less than 16 grams and a nonlinear Bayesian model (von Bertalanffy) was fitted. Based on the known parameters configured in the simulator, we define the model priors for bias correction. The posterior predictive density of the model was calculated and shown as a result in the plot below.

**Hierarchical Bayesian Modeling**

Figura. Pacific White Shrimp (*Litopenaeus vannamei*) grown in ponds (excavated tanks) from a marine shrimp farm.

Aquaculture data naturally exhibit a hierarchical structure, warranting consideration for our proposed model. Within a farm, the data are hierarchically structured across the production cycles, ponds (tanks), and, across the farm level. By delineating data hierarchically, we can analyze growth curves per production cycle, per pond (tank), and if it is in the interest of management for decision-making, at the farm level.

Robust models capturing growth curves at multilevel designed enable inferential benefits through dependencies between these events, showcasing the statistical strength of hierarchical models. What I’m trying to say is that somehow the information from a specific pond (tank) can contribute to estimating the growth curve of another pond (after all, they are ponds from the same farm). Just as information on the production cycles of the same pond (tank) can contribute to predicting future cycles of that same pond. Therefore, instead of considering models for each level independently, we can bring together all the information in a multilevel model that is more robust and perhaps more accurate in making predictions beyond what was observed in the data set.

Below we have a script in R language that simulates data from a shrimp farm, capturing this hierarchical structure of aquaculture data. We are simulating 2 ponds (tanks) with 3 production cycles each (per year) of a shrimp farm, whose data correspond to weekly biometrics (weight in grams) throughout the cultivation (over time), based on the Morgan growth function -Mercer-Flodin.

You can find all the scripts and more information in the links below

My intention is to contribute information and promote the use of the *brms* package.