Help speeding up bernoulli Gaussian process model

andrjohns · November 20, 2020, 2:59am

Drats those are just integrated graphics, there goes that idea. I would still recommend trying the glm function without a GPU, the more efficient construction might outperform due to the copy-costs involved in reduce_sum.

If you have access to a local system with a discrete GPU it could also be a good idea to try it there as well, since the GPU processing represents a greater level of parallelism than you can achieve through reduce_sum (mostly kind of).

Nathan_Lemoine · November 20, 2020, 8:06pm

The lesson here is when PSSC asks ‘Are you sure you don’t need a GPU?’, you always take the GPU…

lin.wang.idd.pasteur · January 24, 2021, 10:16am

For other arguments in a reduce_sum, probably good to pass them by c++ like reference?

andrjohns · January 24, 2021, 12:51pm

Unfortunately that’s not possible. Parameters in Stan (called var in c++) have a value and adjoint stored. These have to be accessed and updated as part of the auto-differentiation process. If these are being accessed by multiple threads then a race condition is introduced, as the adjoint for a given var will differ depending on how many threads have finished accumulating adjoints. To avoid this each thread works with its own copy of the parameter, so that the resulting adjoints are not affected by the other threads.

This is a bit of a rough explanation, @wds15 anything there that should be corrected?

wds15 · January 24, 2021, 4:28pm

What you write is correct for the shared parameters. These must be copied by thread (note that data is not copied). However, the sliced over variables are not being copied, since there are never two threads which work on the same sliced variables. This is why it so a lot more efficient to store things in the sliced variable if these vary by item you reduce over.

Also note that with 2.26.0 the overhead due to the shared arguments was drastically reduced. Before 2.26.0 we made a copy for each partial sum being formed while now we do copies of the shared arguments per thread only (and a given thread can work on multiple partials).

There is a slide in my stancon 2020 contribution on this.

Topic		Replies	Views
Speeding Up Gaussian Process model. Stan message abour leapfrog steps Modeling fitting-issues , performance	2	1341	November 24, 2017
Speeding up gaussian process model for spatial prediction Modeling fitting-issues , performance , gaussian-process , spatial	4	1823	July 2, 2020
Efficiency issue with Gaussian Process model Modeling	9	1785	February 11, 2022
Seeking advice on optimizing a Gaussian Process (GP) prior to improve my model's convergence time Modeling	5	395	June 1, 2024
Gaussian process regression General	14	984	April 2, 2021

Help speeding up bernoulli Gaussian process model

Related topics