Hello everyone!

Using brms, I want to define a hierarchical model capable of relating the response variable (fish_body_size) and a predictor variable (seawater temperature, SST). The response variable is positive and continuous, having no values equal to zero (Figure 1 and Figure 2). Initially, I used an “empty” model, in which only the intercept varies according to the species (since the model with the SST variable takes more than 8 hours to run). I then used the pp_check function to check the empty model’s ability to explain the distribution and range of observed values (Figure 3). I have identified the Gamma distribution and the skew_normal distribution as possible candidates. But I discarded the skew_normal because the model gives negative predictive values. But it seems to me that the empty model with Gamma distribution is not capable of representing the probability distribution of fish_body_size. I ask the community if I am following a wrong method (use the pp_check) or if the lack of capacity of the empty model derives from the formulation of the model (with or without predictor variable, SST). If you consider it appropriate, I can share the raw data.

Thanks

Figure 1

longo_freq-counts.pdf (10.0 KB)

Figure 2

fish_per species_frequency.pdf (12.6 KB)

Figure 3

pp_check.pdf (991.8 KB)