Population vrs group-level model specification

I’d like a second opinion on brms model specifications for a group-level term that I’ve intended to encompass the spatial sampling design used for data collection. My response variable is individual level physiological condition of fish (non-negative, continuous). The sampling design on a survey uses sampling strata, and within each of the 7 strata, up to 5 fish are sampled at a single station within the strata (although sometimes only a single fish is collected at a given station). While the variation in physiological condition attributed to strata or station is not of interest to the study question, I would anticipate that fish collected at the same station, or within the same region might be more similar in condition, hence why I assumed I should include a group-level effect. My covariates of interest include both an individual-level covariate, fish_size, and two station-level covariates, temperature, and fish_density. Any fish collected at the same station have the same temperature and fish_density covariates. There are 499 fish condition observations and 109 stations sampled across 4 sampling years

I specified the following model formula using a nested group-level effect: condition ~ fish_size + s(temperature) + s(fish_density) + (1|strata/station). However, loo() indicated 15 observations with a pareto_k > 0.7. I assume this is because in instances where there is only 1 fish sampled at a given station, removing all observations causes the posterior to change a lot. Unsurprisingly, when I revise the group-level effect to (1|station), I get the same pareto-k warning, and an additional warning that the Tail ESS is too low. If I just use (1|strata), there are no issues with the model run, nor loo(). A few specific questions below:

  1. If several of my population-level covariates vary at the station grouping level, is it appropriate to then use station in the group-level effect? Or should I not even include a group-level effect at all? I’m struggling to understand what’s correct in terms of model specification and what’s not given the underlying data structure, and if a nested group-level effect is warranted.
  2. Is a model misspecification that seems to be implied from the warnings I’m getting most likely due to misspecification of the group-level effect, or something larger than that (e.g. distribution, or priors)? ppc plots for all the models don’t look terrible, although the posterior distribution mean is slightly below the observed data mean - probably why I’m getting high pareto k values?