Problem using cov_exp_quad with large matrices solving Log-Gaussian Cox Processes

Hi, I am dealing with estimating the intensity map of the brain using Log-Gaussian Cox Processes.
I get bad_alloc error while calculating covariance matrices, with 64GB of RAM.
I guess it is from intensive calculation using cov_exp_quad for all the voxels.

model {

    matrix[V, V] cov[2];
    matrix[V, V] L_cov[2];
    matrix[V, K] betas;
    // covariance matrix
    // for beta0
    cov[1] =   cov_exp_quad(brain_mask, sigma[1], rho[1])
                         + diag_matrix(rep_vector(square(noise[1]), V));
    L_cov[1] = cholesky_decompose(cov[1]);
    // for beta1
    cov[2] =   cov_exp_quad(brain_mask, sigma[1], rho[1])
                         + diag_matrix(rep_vector(square(noise[2]), V));                    
    L_cov[2] = cholesky_decompose(cov[2]);
    // betas are later used for calculaing lambda
    betas[,1] = mu[1] + L_cov[1]*eta[1];
    betas[,2] = mu[2] + L_cov[2]*eta[2];


The total number of voxels are now 48846, from 4mm isotropic brain mask.
The number has already been reduced from 352328.

Is there any workaround to get things done?
Would it be a better idea to start using GPUs?

Thank you in advance for all the comments.

Hi and welcome to the Stan Discourse,

GPUs will not help you here as you still need to allocate everything in the CPU memory first.

If I understand correct, V=48846. Which then means that a [V, V] matrix is 48k x 48k. Such a matrix that would store values and adjoints (8+8 bytes per matrix element) would be roughly 34GB in size. And you have at least 4 such matrices here. So you will not be able to fit those in 64GB.

Just out of curiousity, are the matrices sparse?

Hi, @rok_cesnovar. Thanks for the comment.

Yes, the size of the matrix is 48k x 48k.

brain mask in cov_exp_quad is an array of voxel locations in the voxel space. It’s already given and distances between voxels are calculated. So I don’t think the matrix is sparse.

If memory matters, would 256 gigs of RAM work?
I hope there’s another way for more efficient processing that reduces RAM usage.

Memory is definitely the first limiting factor here. Its not the intensive calculation of cov_exp_quad, its the fact that the output of cov_exp_quad is the large matrix and that is also where a lot of the resources are allocated on the autodiff stack.

If we look at what is allocated quickly.
A single cov_exp_quad (see rev implementation here) with V=48846 allocates 17.7GB for the dist_ array (array of doubles), allocates 35GB of varis for cov_lower_ and also stores 8.8GB of pointers to the varis. There is also some other extra stuff but that is negligible. That would sum up to 61,5GB of allocated resources for a single cov_exp_quad.

The cholesky decompose for V=48846 then allocates 35GB of extra varis and roughly 17GB of pointers. So that is an extra 52GB. You have 2 of each so that sums up to almost 230GB just for those. So I am not sure 256GB is going to be enough here. And you have to know that the Stan Math library already does a lot of optimizations to save space everywhere it can, so you wont get any further with a general autodiff approach.

If you have the option of using cloud instances, I would advise you to try an AWS High Memory cloud instance with TBs of memory to at least see what is the maximum required memory here. You can rent those for a few $ an hour and I cant imagine you would need more than an hour or two to get it to start sampling. It probably wont be feasible for the entire sampling as I suspect this is going to take quite some time with these sizes. Just to get a feel of what are the required memory restrictions.

Other options require extensive C++ knowledge and knowledge of the autodiff Stan Math library and require an additional effort. You can potentially calculate the gradient manually for your specific model and implement it in C++. You would save memory space for some of the intermediate calculations there. The other option is to rework the autodiff to use floats instead of doubles. You save half the space for varis and doubles there, but you have a lot of potential issues there with precision with cholesky decompose and even the sampler itself.


Thanks for the valuable comment.

With those restrictions, even if I figure out how much memories does the model need,
it might not be feasible to do the sampling as you mentioned.

I will try to think about changing the whole model structure.

Thank you again!

1 Like