Crossed random effects and map_rect()

Christopher-Peterson · April 1, 2024, 10:19pm

Quick description

I have a multivariate model where the data are partitioned by three independent grouping factors (a,b,c); almost all combinations of each factor are present in the data in nearly equal numbers. Each response variable is a function of one or more latent variables, which are calculated from combination of population-level regression coefficients and random intercepts for each group.

I am considering parallelizing the model with map_rect(), and am debating how to split the model parameters into phi (global) and theta (local). The parameters I’ll be using in the mapped function are intercepts, betas (population-level regression coefficients), r_a, r_b, and r_c (random intercepts). The intercepts and betas will certainly go into phi, but I’m not as certain about the r_*. Each random effect level will be used in multiple shards, but each r_sum[i] = r_a[aa[i]] + r_b[bb[i]] + r_c[cc[i]] is only used once.

Main question:

Does anyone have any intuition as to what effects making r_sum local vs. keeping all r_* in the global parameters would have on the expression graph?

Extra details on the model

I’m working on what is essentially a Poisson-Multinomial model with an unknown N, where

\vec y_i \sim \text{Multinom}(N_i, \vec \theta_i)
N_i \sim \text{Poisson}(\lambda_i)

In practice, N_i is marginalized over and the model is expressed as a series of Poisson likelihoods of the type y_{k,i} \sim \text{Poisson}(\lambda_i \theta_{k,i}).

I believe the underlying process can be conceived as individuals transitioning along a sequence of states (3 or 7, depending on the data set). Each individual has the opportunity to stay in its current state or transition to the next; after a certain amount of time, the number of individuals in each state is counted, which becomes y_i.

The multinomial proportions \vec \theta are calculated from the transition rates \vec p as
\theta_{k,i} =(1 - p_{k,i}) \prod_{j=1}^{j=k-1} p_{j,i}, with the first and last states being \theta_{1,i} = (1 - p_{1,i}) and \theta_{K,i} = \prod_k p_{k,i}, respectively.

I am trying to estimate how a group of covariates (X) and grouping factors (a,b,c) affect \lambda and \vec p. I’m modeling them as:

\text{logit}(p_{k,i}) = \mu_k + X_{ [i,] } \vec \beta_k + r_{1,k}[a_i] + r_{2,k}[b_i] + r_{3,k}[c_i], with a similar formula for \lambda. r indicates a partially pooled parameter (i.e., a random effect) for one of the grouping factors.

Profiling

I used Stan’s profiler to benchmark this with a relatively small subset of the data; the two parts of the model that contributed to the bulk of the runtime were defining the linear predictor (\vec \theta as a function of p) and the Poisson likelihood. The likelihood was implemented with reduce_sum() on a 48 core machine, and took about 5 times longer to run than the linear predictor. Within each reduce_sum() thread, I’m using a single vectorized poisson_log_lupmf(); switching to _lpmf has minimal performance changes.

martinmodrak · April 7, 2024, 5:26am

I am not an expert on paralellization, but since nobody answered, I’ll give it a try.

If there are just three groups, I’d expect the difference between splitting the three associated parameters and sharing them across all slices to be minimal - the main overhead in paralellization is passing the parameters to each slice and then getting the gradients back. Passing 2 values more in each direction should have small impact.

With Poisson likelihood and large datasets, there might also be savings to be made with the “sufficient statistics” trick. I.e. if some K rows in your dataset have identical values for all predictors, you can collapse them into one where you multiply the linear predictor by K and treat the sum of all outcomes as the single outcome: Sufficient statistic - Wikipedia

Hope that helps at least a little bit.

Topic		Replies	Views
Reducing parameter space when predicting random effect covariances Modeling techniques , fitting-issues , specification	1	339	November 6, 2023
Sending a cholesky_factor_corr parameter to map_rect for an HLM with multiple group-varying parameters Modeling	1	279	February 11, 2020
Multilevel model for longitudinal data: fixed x random interaction (crossed random effects) Modeling	2	1020	April 7, 2018
Suggestions for identifying summed random effects Modeling specification , ecology	3	759	July 1, 2020
R2D2M2 prior for non-GLMMs Modeling techniques	11	453	November 10, 2024

Crossed random effects and map_rect()

Quick description

Main question:

Extra details on the model

Profiling

Related topics