Speeding up the pre-Stan basis computations in brms?

I have a GAM of the form:

power ~ mgcv::t2(
	lat , long , time , freq , accuracy_rating
	, d = c(2,1,1,1)
	, bs = c("sos","tp","tp","tp")
	, k = rep(10,times=4)
)

The data has about 2 million rows, so running brms is taking a long time (and lots of RAM; 30% of my 380GB system) even before sampling starts, presumably as it constructs all the bases. Anyone (@ucfagls? @paul.buerkner?) have any tricks? I only see one core in use; might there be any parallelism opportunities?

You’re at the limits of what mgcv can do so I’m not expecting this to be quick at all, but…

Don’t use the thin plate spline basis for anything this large; mgcv has to form the full (actually some much lower number of observations defined by max.knots, but it’s still large) TPRS basis then eigendecompose it to get the 10 largest variance basis functions you asked for. Tihs is always going to be slow.

Instead, try the cubic regression spline basis by using bs = c("sos", rep("cr", 3)).

Also, a smooth in 5 dimensions is going to generate a massive basis; you’re getting 10,000 basis functions here I think. There may not be a lot you can do about that if you want everything to vary spatially and in time and with each other.

What parallelism is available for this is with mgcv::bam() which doesn’t affect the standard basis construction code IIRC.

1 Like