Stan_jm residuals

I am running several models using rstanarm::stan_jm and would like to run various diagnostic models examining residuals from the longitudinal and event submodels to check the functional forms of the relations between predictor variables and residuals. Unlike residuals() for virtually every other model, residuals() for a stan_jm model returns:

Error in residuals.stanmvreg(jointmIAA) : Not currently implemented.

Is there a way to extract from the stanmvreg object the residuals from the two submodels, or alternatively to extract the submodels themselves in a way that would make it possible to extract the residuals?

Thanks to the rstanarm crew for developing these rstanarm functions! I know that I am pushing the limits of your models.

rstanarm version: 2.19.2
Windows 10 62 bit up-to-date

Larry Hunsicker

I would subtract posterior_predict() or posterior_linpred(..., transform = TRUE). from the observed outcome to get a distribution of errors. Not only is the residuals method apparently not implemented for stan_jm, it is not really appropriate to just consider a single point for each residual.

Ben: Let me rephrase my question, now that I have had a bit more time to think it over. In essence, what I am asking is whether it is possible to determine whether the impact of eta (assume for a moment a univariate longitudinal model) is linear. My understanding is that the longitudinal model(s) in essence estimate(s) a latent set of “true” values for the time varying longitudinal values – whether a “true present value” or a “true rate of change of the longitudinal value,” or whatever, for input into the event model at each event time, with the assumption that this value is added to the linear estimator for the log hazard in the event model. The accuracy of the estimate of the “true” but latent value will, of course, be dependent on the specification of the longitudinal model, and I can check that out by examination of the longitudinal component(s) separate from the joint model. But it seems to me that there is an assumption that the impact of the eta on hazard is linear. (For example, the impact of the time dependent estimated value of log(bilirubin) on hazard of death from PSC seems to be assumed to be linear in your example. One could evaluate that in a standalone survival model using something like a pspline function of a baseline log(bilirubin). But there doesn’t seem to be a way to insert a pspline into the piece that relates the value(s) of the time dependent eta(s) into the final event hazard. And if the baseline hazard attentuates over time, as often is the case, then an evaluation of the functional form of the relationship between a baseline risk and log(hazard) may be misleading about the functional form of the time dependent longitudinal value on log(hazard. It’s the functional form of the effect of the time varying etas on log hazard that I am trying to figure out.

As you know from former discussions, I am a physician, not a mathematician or statistician. So all of the above may be totally confused. If so, you have my apologies. But if there is some sense in my question, is there a way to query the functional form of the impact of eta(s) on log hazard?
Thanks for considering the above.
Larry Hunsicker

@Lawrence_Hunsicker You are correct – the assumption in almost all joint models to date is that the expected value of the biomarker (or some function of it such as rate of change in the biomarker) is linearly related to the log hazard. This is an assumption that has been taken for granted in most of the literature.

Obviously, this assumption has been relaxed in survival models more generally – for e.g. allowing the effect of a covariate in the survival model to have a non-linear effect on the log hazard. But these methods haven’t yet propagated through to joint models (there are a few notable exceptions in the literature, but I don’t have the references on hand at the moment).

Similarly, the joint models in rstanarm also assume that the etavalue, etaslope, or etaauc is linearly related to the log hazard. However there is one exception I can think of…

  • You can use the etavalue_etavalue(#) type of association structure to get a quadratic effect. For example, if your joint model has only one longitudinal biomarker, then I think etavalue_etavalue(1) should effectively include the quadratic effect of the expected value in the linear predictor for the survival submodel.

You could assess the importance of the quadratic effect using posterior summaries, and/or use the loo method to see if the quadratic effect improves model fit.

But to refer to one of your earlier questions… you unfortunately won’t be able to get residual plots for the survival submodel because I didn’t implement a method for calculating residuals on the censored survival data.

Hope that helps clarify a bit.

Thanks, Ben. That clarifies quite a bit. I like your “etavalue_etavalue(1)” trick. 2nd term of the Taylor series. I have three further clarifying questions:
a) Your vignette distinguishes between the “linear predictor” and the “expected value.” I get a bit lost about the various subscripts and the difference between mu and nu. Am I correct that “linear predictor” and “nu” refer to the prediction for the individual subject, and that the “expected value” and “mu” refer to the fixed effects only – marginlized over the collection of subjects?
b) I assume that “etavalue_etavalue(x)” tells the program to multiply (interact) the first term in the list of associations with the “xth” term. Is the program clever enough to “interact” the first term with itself?
c) We users are given a specific (though large) selection of association structures that we can choose. How much programming would be required to add a choice of using a pspline for a single longitudinal factor? It seems to me that the “etavalue_etavalue(x)” choice is just telling the system to add to the log hazard a third covariate (the interaction) to the two main effect values. I assume that another association structure choice could similarly add the various components of a spline to the log hazard. We would have to reconstruct the spline from the various etas at the end. But that would not be too hard. Larry

Some answers inline below… cheers.

Linear predictor is the value of the regression equation for an individual subject. The expected value is after applying the inverse link function to the value of the linear predictor.

If you are using a normally distributed biomarker (i.e. an extension of linear regression) then the linear predictor and the expected value are the same. Because the link function for the normally distributed biomarker is the identity.

They differ when you have a binary biomarker (i.e. an extension of logistic regression) or a count (i.e. an extension of poisson regression), for example.

Yeah. It says interact the linear predictor value of biomarker 1 with the linear predictor value of biomarker 2. It is designed for multivariate joint models, in which case your association structure is allowed to be a list. So the following two association structures would be equivalent for a multivariate joint model with two biomarkers:

list(c("etavalue", "etavalue_etavalue(2)"), c("etavalue")) 
list(c("etavalue"), c("etavalue", "etavalue_etavalue(1)"))

Because both include the main effects for etavalue, and the two way interaction between biomarker 1 and biomarker 2.

From memory, yeah, it should be. I think I implemented it that way. It will throw an error if that’s not the case.

Yeah, it wouldn’t be too much in theory, but it is all the supporting code that would be a pain. Stan doesn’t have a built in pspline function so we’d have to define a psline function in the Stan functions block. Also the rstanarm models are precompiled at installation time, so the .stan file includes all possible model specifications – they aren’t written at runtime. So adding the psplines means making the .stan file even longer and it is already dense. But someone could in theory do it. I’ve started a new job outside of academia, so don’t really have time for adding new features.