Does fitted() not work with 'newdata=' argument when missing data is imputed during the model fit?

jd_c · April 16, 2021, 2:02pm

R version 3.6.3. brms version 2.15.0

I would like to use the fitted() function to estimate the fitted model for some new data. However, I am getting NaN and NAs for the response variable of interest in my non-linear model.

fit <- brm(
	bf(outcome ~ previousoutcome + throttleM,
	throttleM ~ 0 + mi(throttle), nl=TRUE) +
	bf(throttle | mi() ~ 1 + previousoutcome + time*treatment),
	data = data1, iter=4000, family = gaussian(),
	prior = c(	
		prior(normal(2,0.5), class=sigma, resp="outcome"),	
		prior(normal(1,0.001), class=b, coef="mithrottle",resp="outcome", nlpar="throttleM"),
		prior(normal(0,1), class=b, coef="previousoutcome", resp="throttle"),
		prior(normal(0,0.05), class=b, coef="treatment", resp="throttle"),
		prior(normal(0,0.05), class=b, coef="time", resp="throttle"),
		prior(normal(0,0.05), class=b, coef="time:treatment", resp="throttle"),
		prior(normal(5,2), class=Intercept, resp="throttle"),
		prior(normal(2,0.5), class=sigma, resp="throttle")
  		),
  control = list(adapt_delta = 0.9999, max_treedepth=15), save_pars = save_pars(all = TRUE)
)

#this works
f <- fitted(fit)

#this gives NaN and NA for 'Estimate.outcome' and 2.5 and 97.5% intervals but gives values for 'Estimate.throttle'
#this happens, even though I use the same dataset as in the above call where newdata=NULL
f <- fitted(fit, newdata=data1)

Does fitted(fit, newdata= newdata) not work for a non-linear model where the response is estimated in part by another response that is missing? @paul.buerkner @martinmodrak
Thanks!!

jd_c · April 16, 2021, 2:53pm

Interesting, but I also get an error when trying to use conditional_effects(fit, resp=“outcome”) on this model:

Error in seq.default(min1, max1, length.out = resolution) :
‘from’ must be a finite number
In addition: Warning messages:
1: In min(data[[effects[1]]], na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In max(data[[effects[1]]], na.rm = TRUE) :
no non-missing arguments to max; returning -Inf

Not sure if that is somehow related.

martinmodrak · April 23, 2021, 10:28am

Sorry for not getting to you earlier. The question is relevant. I can’t rerun the analysis, but I think the problem might be in combining a non-linear formula with missing data. Using just missing data (e.g. from the example in the vignette Handle Missing Values with brms • brms) seems to work well with fitted.

So the main question is why do you have the non-linear part and why you just don’t use

bf(outcome ~ previousoutcome + mi(throttle)) +
	bf(throttle | mi() ~ 1 + previousoutcome + time*treatment)

Understanding this could possibly help us make the model better. In fact, the model is a bit suspicious to me: You need very large adapt_delta and max_treedepth. This usually signals there is something wrong with the model. You also force quite tight priors (which are not necessarily wrong if you have good justification, but it still usually is a bit of thin ice).

Best of luck with the model!

jd_c · April 23, 2021, 12:46pm

Thanks, that is what I was thinking as well (combining a non-linear formula with missing data).

Ah, well that is a long story haha, and I have thought about making another post about it. The reason is that I just wanted the previousoutcome (a column of data) added to something called throttle (another column of data) as the model. I didn’t want a parameter for previousoutcome and a parameter for throttle in a linear model. Basically in this data, i think there is a baseline probability for the outcome, but also when there is an outcome each month, people react to that outcome such that it greatly influences the next months outcome. It reminds me of cruise control on a car, where you have a target speed but there are hills on the road, so the car is constantly modulating the throttle. So, originally I tried to fit a PD controller model (without the integral part), but so much of the data is missing (ie target, target lag, etc), that I had to impute lots of the data. I had a lot of trouble getting the model (nonlinear) to run and kept simplifying it. Finally I wrote my own model for what was going on and simplified it to this point. It runs well, and the fit is great. But then I realized that I am probably just overfitting, because I am estimating a parameter for every missing datapoint of throttle, and the nonlinear formula is such that the outcome = previousdata + imputed missing, which would always seem to fit well…

Anyhow, it may be a sad story of lots of hard thinking and trying only to come to the conclusion that I am fooling myself haha…

martinmodrak · April 23, 2021, 1:23pm

Not sure I can help with the bigger question, but if all you want is to avoid having a coefficient for a term (i.e. fixing the coefficient to 1), you can do this even in normal linear formulas by giving constant(1) as a prior for the coefficient. This way you at least could have the exact same model without the non-linear formula.

jd_c · April 23, 2021, 2:16pm

Ohhh of course! Thanks. Yes, same model if I put the intercept as 0 and constant(1) priors.

m1 <- brm(
	bf(outcome ~ 0 + previousoutcome + mi(throttle)) +
	bf(throttle | mi() ~ 1 + previousoutcome + time*treatment),
	data = data3, iter=2000, family = gaussian(),
	prior = c(	
		prior(normal(2,0.5), class=sigma, resp="outcome"),
		prior(constant(1), class=b, coef="mithrottle", resp="outcome"),
		prior(constant(1), class=b, coef="previousoutcome", resp="outcome"),
		prior(normal(0,1), class=b, coef="previousoutcome", resp="throttle"),
		prior(normal(0,0.05), class=b, coef="treatment", resp="throttle"),
		prior(normal(0,0.05), class=b, coef="time", resp="throttle"),
		prior(normal(0,0.05), class=b, coef="time:treatment", resp="throttle"),
		prior(normal(5,2), class=Intercept, resp="throttle"),
		prior(normal(2,0.5), class=sigma, resp="throttle")
  		),
  control = list(adapt_delta = 0.99, max_treedepth=15), save_pars = save_pars(all = TRUE)
)

However, the original error persists. I can do fitted(m1) and get output for both outcome and throttle. However, if I do fitted(m1, newdata=data3) (where data3 is the same data that the model was fit on; so really the same call as doing fitted(m1) ) then I get NaN for all the values of outcome but I get estimates for throttle. Also, conditional_effects(m1) throws the same error as before.

martinmodrak · April 29, 2021, 2:54pm

Oh, you are right, I messed up before and can now verify that I get the same result. It is actually a known issue: predict.brmsfit with imputed missing values · Issue #544 · paul-buerkner/brms · GitHub and you currently have to handle such prediction tasks manually (you can first use predict to produce a bunch of imputed datasets and then do prediction on those imputed datasets).

conditional_effects uses prediction under the hood, so that’s somewhat expected :-)

Best of luck with your model!

jd_c · April 29, 2021, 7:33pm

Gotcha! Thanks

Topic		Replies	Views
Getting the grand mean from fitted() brms	4	716	February 18, 2020
Predict.brms in multivariate model with imputation brms	2	1262	November 5, 2018
Rds imported model works incorrectly with marginaleffects brms rstan , fitting-issues , brms	1	251	November 8, 2023
Help with predictions with brms and including errors General brms	1	312	March 14, 2024
Handling missing data in brms brms	1	495	February 25, 2021

Does fitted() not work with 'newdata=' argument when missing data is imputed during the model fit?

Related topics