Inf- values when attempting to simulate data: sample_prior = "only" and "posterior_perdict"

Hi, let me say straight up I am new to this sort of modeling, apologies for any faux pas both in asking this question and my modeling.

I am currently trying to simulate data for a non-linear model I have developed, with the end goal of sharing the model for others to use (perhaps as a shiny app, but again, new territory for me) using the brms package in R.

I had hoped to summarize an equation that I could share, however due to the apparently complex relationship between the model supplied estimates and predictors, I realized it would probably be easier to just share the model in question.

My new issue is that I cannot just share the model as it is built on confidential data.

After reading a number of topics on this site, I saw that you should be able to use sample_prior = "only" and then use posterior_predict to get the simulated data, but when I do this, the values returned are all ‘-Inf’ values. Here is my code for the model below:

prior1 <- prior(normal(0, 100), nlpar = "b1", lb = 0) + 
  prior(normal(0, 10), nlpar = "b2", lb = 0) 

fit <- brm(bf(y~ b1^b2, 
                                  b1 ~ 1 + x
                                  b2 ~ 1,
                                  nl = TRUE),
                               data = data, 
                               prior = prior1, 
                               iter = 54000, 
                               warmup = 22000,
                               thin = 54, 
                               cores = 4,
                               family = lognormal(),
                               control = list(adapt_delta = 0.99,
                                              max_treedepth = 12), 
                               file = "fit",
                               sample_prior = "only",
                               save_all_pars = T, 
                               seed = 1234) 

str(posterior_predict(fit)) 

I get these results:

 num [1:2372, 1:2619] 27 84 116.8 168 98.9 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : NULL
> vals <- (posterior_predict(fit))
> vals[[1]]
[1] Inf

Unfortunately I cannot share the underlying data here either.

I would be grateful for any advice or insight at all. Ideally a solution would not involve redoing the model, if possible (but understand it may just have to be if I’ve done something incorrectly).

Thanks

1 Like

I think one issue could be that the prior is too wide. With sample_prior = "only", I assume you only sample from the prior. A parameter with a normal(0, 100) prior is going to give very high prediction with the exponential transformation (due to the lognormal).

quantile(exp(rnorm(1000, 0, 100)), probs = c(0.9, 0.95))
         90%          95% 
7.292745e+48 2.708517e+66

With these kind of values, Stan is overflowing to infinity. (Too wide priors might also be why you need adapt_delta = .99 and max_treedepth = 12. You should also not need 54000 iterations for most applications).

2 Likes

Thanks for your reply and your advice, I will attempt to fix these issues and report back about whether this addresses the problem.

Fantastic, narrowing priors seems to have solved the issue, I am now getting results for posterior_predict, while fit and estimates appear equivalent to previous results. Also lowered iterations as advised and returned adapt_delta and max_treedepth back to default with no divergent transitions.

prior1 <- prior(normal(0, 1), nlpar = "b1", lb = 0) + 
  prior(normal(0, 0.5), nlpar = "b2", lb = 0) 

fit .2 <- brm(bf(y ~ b1^b2, 
                                     b1 ~ 1 + x,
                                     b2 ~ 1,
                                     nl = TRUE),
                                  data = data, 
                                  prior = prior1, 
                                  iter = 4000, 
                                  warmup = 1400,
                                  thin = 4, 
                                  cores = 4,
                                  family = lognormal(),
                                  #control = list(adapt_delta = 0.99,
                                   #              max_treedepth = 12), 
                                  file = "fit.2",
                                  sample_prior = "only", 
                                  seed = 1234
) 

With result

vals <- posterior_predict(fit.2)
vals[[1]] 
[1] 0.6114292

Thanks again!

2 Likes