Quick description
I have a multivariate model where the data are partitioned by three independent grouping factors (a,b,c); almost all combinations of each factor are present in the data in nearly equal numbers. Each response variable is a function of one or more latent variables, which are calculated from combination of population-level regression coefficients and random intercepts for each group.
I am considering parallelizing the model with map_rect()
, and am debating how to split the model parameters into phi
(global) and theta
(local). The parameters I’ll be using in the mapped function are intercepts
, betas
(population-level regression coefficients), r_a
, r_b
, and r_c
(random intercepts). The intercepts
and betas
will certainly go into phi
, but I’m not as certain about the r_*
. Each random effect level will be used in multiple shards, but each r_sum[i] = r_a[aa[i]] + r_b[bb[i]] + r_c[cc[i]]
is only used once.
Main question:
Does anyone have any intuition as to what effects making r_sum
local vs. keeping all r_*
in the global parameters would have on the expression graph?
Extra details on the model
I’m working on what is essentially a Poisson-Multinomial model with an unknown N, where
\vec y_i \sim \text{Multinom}(N_i, \vec \theta_i)
N_i \sim \text{Poisson}(\lambda_i)
In practice, N_i is marginalized over and the model is expressed as a series of Poisson likelihoods of the type y_{k,i} \sim \text{Poisson}(\lambda_i \theta_{k,i}).
I believe the underlying process can be conceived as individuals transitioning along a sequence of states (3 or 7, depending on the data set). Each individual has the opportunity to stay in its current state or transition to the next; after a certain amount of time, the number of individuals in each state is counted, which becomes y_i.
The multinomial proportions \vec \theta are calculated from the transition rates \vec p as
\theta_{k,i} =(1 - p_{k,i}) \prod_{j=1}^{j=k-1} p_{j,i}, with the first and last states being \theta_{1,i} = (1 - p_{1,i}) and \theta_{K,i} = \prod_k p_{k,i}, respectively.
I am trying to estimate how a group of covariates (X) and grouping factors (a,b,c) affect \lambda and \vec p. I’m modeling them as:
\text{logit}(p_{k,i}) = \mu_k + X_{ [i,] } \vec \beta_k + r_{1,k}[a_i] + r_{2,k}[b_i] + r_{3,k}[c_i], with a similar formula for \lambda. r indicates a partially pooled parameter (i.e., a random effect) for one of the grouping factors.
Profiling
I used Stan’s profiler to benchmark this with a relatively small subset of the data; the two parts of the model that contributed to the bulk of the runtime were defining the linear predictor (\vec \theta as a function of p) and the Poisson likelihood. The likelihood was implemented with reduce_sum()
on a 48 core machine, and took about 5 times longer to run than the linear predictor. Within each reduce_sum()
thread, I’m using a single vectorized poisson_log_lupmf()
; switching to _lpmf
has minimal performance changes.