I have kind of a general question about the way that multi-threading works. I have a very large
reduce_sum function that has a ton of data passed in, with the intention being that I want to shard the data across threads:
target += reduce_sum( choice_lpmf, indices, // Thing to cut across grainsize, // grainsize == 1 p_R_ftr_not_na, p_R_ftr_wo_clm, pricing_error_mean, pricing_error_sd, tm_int_ind_n, p_R_ftr_wo_clm_w_tm, // transformed data, N pricing_error_mean_tm, // model block, N pricing_error_sd_tm, // model block run_demand, // data is_nb, // transformed data, N L_draws, // data lambda, // transformed data, N demand_choice_index, // data tm_ind, // data dollar_norm, // data is_nb_int, // transformed data tm_elig_n, // transformed data J, // data J_oo, // data prices_not_na_ind, // data, N x J prices_oo_not_na_ind, // data, N x J_oo limits, // transformed data, J -- limits_oo, // transformed data, J_oo -- prices, // transformed data, N x J prices_oo, // transformed data, N x J_oo tm_optin_disc_n, // transformed data, N prices_pre_firm_ind, // data, N x J prices_pre_cov_ind, // data, N x J pre_oo_firm_ind, // data, N x J_oo pre_oo_cov_ind, // data, N x J_oo clm_surcharge, // data, N pareto_loc, // transformed data, N pareto_shape, // data risk_aversion, // model block, scalar firm_switch_cost, // model block, N cov_switch_cost, // model block, scalar tm_frictions, // model block, N sigma_logit, // model block, scalar plan_fes, // model block, N x J plan_oo_fes, // model block, N x J_oo quad_nodes, quad_weights );
The mapping function
choice_lpmf cuts all these variables into chunks and then does all the likelihood calculations. I have yet to find a better way to do this – ideally,
reduce_sum would shard all the variables for me before handing them to a thread, but I have not yet found a way to do this.
Many of these variables are quite large. On a single threaded run, this doesn’t use much memory, it’s just slow.
However, I’m running into the issue where, when I start adding threads, it quickly becomes an inoperable problem. The memory usage is insane, and I cannot actually run this model on the full data set with more than 1-2 threads.
My best guess here is that
reduce_sum deep copies all the variables to each thread. Is there a way to prevent this? Maybe to flag some of these variables as something to be passed by reference? Or is there some other way of writing out functions like this that use lots of heterogeneous data?