Map_rec with common real data

I am using map_rect function in stan to parallize my code. The stan manual says the structure of the map_recp is

vector map_rect((vector, vector, array[] real, array[] int):vector f,
                vector phi, array[] vector thetas,
                data array[,] real x_rs, data array[,] int x_is);

In my stan code, I have a vector data variable which is common to each parallel operation. To pass this data variable, I am passing x_rs = to_array_2d(rep_matrix(data_vector, m)'). In my application, m ~ 1e5 which creates a lot of unnecessary copying. I was wondering if I can pass phi=data_vector. Does it create a problem, given that phi is a parameter not a data variable?

I am also open to other solutions.

I’m not an expert on Stan’s internals, but I believe that passing data to phi would add autodiff overhead to it, which would probably swamp any gains to be had from avoiding the copying. Are you creating the repeated array in the transformed data block?

Yes, I am doing that

If your common data is relatively small, you could declare it as a literal within the function. The easiest way to do that would be by having your declaration in an external file and including it in your function; this way you could update the data without having to modify the model’s main source code (though it would require recompilation still).

functions {
  vector my_map_function(vector phi, vector theta,
          data array[] real x_r, data array[] int x_i) {
  #include fixed_data.txt //  defines common_vector; 
  }
}

Where your fixed_data.txt would look something like:

vector[23] common_vector = [-0.108440136566699, 0.196844661678787, -0.499034860949633, 0.767407343297395, -0.710436107955834, 0.659675941554362, 0.512489385862679, -0.469499711610454, 0.11092275549657, 0.0771430894610884, 1.53274393700302, -2.63541228749811, -0.71209066070144, 1.02167999746454, 0.234319407970257, 1.57823053958557, 0.0257982148990972, -0.21109895836873, -0.295251987262776, -1.2180698015353, 0.367281389491287, 0.300076608836427, 1.5185854014602]';

You could auto-generate the fixed data file from R with something like:

library(glue) # can be done w/ paste() instead of glue() but it's uglier
write_common_vector <- function(vector, filename = 'fixed_data.txt') {
  template = "vector[{size}] common_vector = [{collapsed_vector}]';"
  out_text = glue(template, size = length(vector), collapsed_vector = glue_collapse(vector, sep = ', ')) 
  writeLines(out_text, filename)
  invisible(out_text)
}

I don’t have a huge amount of experience with using includes, but one possible danger that comes to mind is that the model may not automatically detect it needs recompilation if the include changes but the main stan file doesn’t. Does anyone know if this is the case? It should be relatively straightforward to modify the R function to check if the output file has changed, and use that to determine if you want to force recompilation.

This solution worked really well.