Hi!
I’m trying to fit the non-linear von Bertalanffy growth equation to length-at-age data of fish:
L_t=L_{\infty}(1-e^{(-K(t-t_0))} using brms
. I am experiencing some problems with initial values.
I have two sites with data from multiple years that I want to compare. I include site as a dummy coded variable, following this post. I also let parameters K and L_\infty vary between cohorts (birth_year
).
Furthermore, exploratory analysis and QQ plots in particular made me want to fit it on log scale (commonly done for this model), and also utilise a Student-t likelihood to adress the tail situation that arises whith a gaussian likelihood on log data. I put the data on a repository. The model I’m fitting to that data is this:
library(readr)
library(brms)
d <- readr::read_csv("https://raw.githubusercontent.com/maxlindmark/stan_data/master/d.csv")
# Informative priors to for convergence, chosen after sampling from the prior predictive distribution
prior <-
prior(normal(-0.5, 1), nlpar = "t0C") +
prior(normal(-0.5, 1), nlpar = "t0W") +
prior(normal(0.2, 0.1), nlpar = "KC") +
prior(normal(0.2, 0.1), nlpar = "KW") +
prior(normal(45, 20), nlpar = "LinfC") +
prior(normal(45, 20), nlpar = "LinfW")
# I use the following inits
inits <- list(
t0C = -0.5,
t0W = -0.5,
KC = 0.5,
KW = 0.5,
nu = 10,
mu = 10
)
m <-
brm(
bf(
log(length_cm) ~ areaW*log(LinfW*(1-exp(-KW*(age-t0W)))) +
areaC*log(LinfC*(1-exp(-KC*(age-t0C)))),
t0C ~ 1, t0W ~ 1, KC ~ 1 + (1|birth_year), KW ~ 1 + (1|birth_year),
LinfC ~ 1 + (1|birth_year), LinfW ~ 1 + (1|birth_year), nl = TRUE),
data = d, family = student(), prior = prior,
seed = 9, iter = 50, thin = 1, cores = 1, chains = 1,
inits = list(inits)
)
This works well, it samples and when I run for more chains the model diagnostics and fit quite good! However, when I want to use two chains:
# More chains
list_of_inits <- list(inits, inits)
m2 <-
brm(
bf(
log(length_cm) ~ areaW*log(LinfW*(1-exp(-KW*(age-t0W)))) +
areaC*log(LinfC*(1-exp(-KC*(age-t0C)))),
t0C ~ 1, t0W ~ 1, KC ~ 1 + (1|birth_year), KW ~ 1 + (1|birth_year),
LinfC ~ 1 + (1|birth_year), LinfW ~ 1 + (1|birth_year), nl = TRUE),
data = d, family = student(), prior = prior,
seed = 9, iter = 50, thin = 1, cores = 2, chains = 2,
inits = list_of_inits
)
I get the following error message:
...
Chain 2: Rejecting initial value:
Chain 2: Error evaluating the log probability at the initial value.
Chain 2: Exception: student_t_lpdf: Location parameter[1] is nan, but must be finite! (in 'modeld9cc6691c6e7_dae435bc9f4399190964a8f3e866def5' at line 134)
Chain 2:
Chain 2: Initialization between (-2, 2) failed after 100 attempts.
Chain 2: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done
Chain 1: Iteration: 15 / 50 [ 30%] (Warmup)
Chain 1: Iteration: 20 / 50 [ 40%] (Warmup)
....
Chain 1 works again, but chain 2 cannot find good initial values. Line 134 (which the error points me to) in my stancode is this one:
target += student_t_lpdf(Y | nu, mu, sigma)
.
(That’s why I set inits for mu
, but I don’t think that’s correct…) My guess now is that I actually don’t set inits for the parameters I should, and therefore they are random which in my model can be problematic. But I’m not sure how to find which parameters get bad inits, and initis = "0"
across the board of inits is not an option here…
Any advice on how to proceed from here is greatly appreciated!