Hello,
I am hoping to get some advice on speeding up a dynamic latent factor model, which I am fitting using cmdstanr. The major speed bottleneck seems to the construction of the linear predictor nu_metric
. Due to the complexity of assigning parameters by time period t
as well as unit j
and item i
, my basic approach has been to loop over every observation (the data in “long” form, with each row a period-unit-item triad). My attempts to vectorize the construction of nu
as well as to parallelize within chains have not speed up sampling (if anything, they have slowed it down).
Here is a highly simplified version of the model that illustrates the essence of the problem:
data {
int<lower=1> D; // number of latent dimensions
int<lower=1> J; // number of units
int<lower=1> T; // number of time periods
int<lower=0> N_metric; // number of observations
int<lower=0> I_metric; // number of items
vector[N_metric] yy_metric; // outcomes
array[N_metric] int<lower=1> ii_metric; // item indicator
array[N_metric] int<lower=1> jj_metric; // unit indicator
array[N_metric] int<lower=1> tt_metric; // period indicator
}
parameters {
array[T, I_metric] real alpha_metric; // metric intercepts
array[I_metric, D] real lambda_metric; // metric loadings
array[T, J, D] real eta; // latent factors
vector<lower=0>[I_metric] sigma_metric; // residual sd
}
model {
/* Priors */
profile("priors") {
to_array_1d(alpha_metric) ~ std_normal();
to_array_1d(lambda_metric) ~ std_normal();
to_array_1d(eta) ~ std_normal();
sigma_metric ~ student_t(4, 0.5, 0.5);
}
/* Linear predictor */
vector[N_metric] nu_metric;
profile("linear_predictor") {
for (n in 1:N_metric) {
nu_metric[n] = alpha_metric[tt_metric[n], ii_metric[n]] +
to_row_vector(lambda_metric[ii_metric[n], 1:D]) *
to_vector(eta[tt_metric[n], jj_metric[n], 1:D]);
}
}
/* Likelihood */
profile("likelihood") {
target += normal_lupdf(yy_metric | nu_metric, sigma_metric[ii_metric]);
}
}
Here is a glimpse at the toy data I’ve been using:
$ D : num 2
$ J : int 50
$ T : int 1
$ N_metric : int 498
$ I_metric : int 10
$ yy_metric : num [1:498] -1.5271 -0.2829 -0.4005 -0.0445 2.816 ...
$ ii_metric : int [1:498] 1 1 1 1 1 1 1 1 1 1 ...
$ jj_metric : int [1:498] 1 2 3 4 5 6 7 8 9 10 ...
$ tt_metric : int [1:498] 1 1 1 1 1 1 1 1 1 1 ...
Fitting the model with 1000 warmup and 1000 sampled iterations on the toy data takes about two minutes per chain. The output of profiles()
for the typical chain is
name thread_id total_time forward_time reverse_time chain_stack
1 likelihood 0x115491600 5.18406 4.50953 0.674531 298901
2 linear_predictor 0x115491600 77.50600 73.98060 3.525410 297706392
3 priors 0x115491600 2.28105 2.15572 0.125335 1195608
no_chain_stack autodiff_calls no_autodiff_calls
1 0 298901 1
2 148853196 298901 1
3 0 298901 1
which makes it clear that constructing nu
dominates the computation time. If anyone has bright ideas for speeding that process up, I would love to hear them.
Many thanks!
Devin