# Try to find a better way to speed up a hierarchical logistic model and reduce_sum only helped a little

Hello everyone,

I have two groups of subjects who participated in a binary choice experiment. The number of observations for each subject varies, from ~300 to ~1700. Here I’ve tried to model their choice behaviors for each subject using a straightforward logistic model and compare the differences of decision variables (i.e., independent variables in regression) on the group level. So everyone gets their own set of parameters but is constrained by the group-level priors.

The model seems to work fine, and the result is as what I expect and not too different from the estimates of the MLE method. I’m wondering if there is a better way to speed it up? I’d like to use vectorization as much as possible, but I think it’s a little tricky because each subject has a set of beta and design matrix, I can’t think of a way to do just one matrix multiplication in total but one for each subject.

Here is my code,

``````data {
int<lower=1, upper = 2> n_group; # number of groups
int<lower=1> n_sub; # number of subjects
int<lower=1> n_dv; # number of decision variables (or independent variables)
int<lower=1> obs_start[n_sub]; # row index of the beginning of each subject's observations
int<lower=1> obs_end[n_sub]; # row index of the end of each subject's observations
int<lower=1> n_obs_total; # number of total observations, all subjects' combined
int<lower=1, upper=2> group[n_sub]; # group id for each subject
int<lower=0, upper=1> y[n_obs_total]; # binary response
matrix[n_obs_total,n_dv] dv; # design matrix
}

parameters {
vector[n_dv] mu_grp[n_group];
vector<lower=0>[n_dv] sigma_grp[n_group];
vector[n_dv] beta[n_sub];
}

model {
vector[n_obs_total] dv_x_beta;

for (g in 1:n_group) {
mu_grp[g] ~ std_normal();
sigma_grp[g] ~ gamma(2, 0.1);
}

for (s in 1:n_sub) {
beta[s] ~ student_t(3, mu_grp[group[s]], sigma_grp[group[s]]);
dv_x_beta[obs_start[s]:obs_end[s]] = dv[obs_start[s]:obs_end[s]] * beta[s];
}

y ~ bernoulli_logit(dv_x_beta);

}

generated quantities {
vector[n_dv] mu_grp_diff;
vector[n_dv] sigma_grp_diff;

# compare the differences of parameter estimates on group level
mu_grp_diff = mu_grp[1] - mu_grp[2];
sigma_grp_diff = sigma_grp[1] - sigma_grp[2];
}
``````

With 53 subjects and 48503 observations in total, this model running with 4 parallel chains and 500 iters for both warmup and sampling (1000 in total) took ~450s to finish.

I also tried a model using reduce_sum, but it didn’t improve as much as expected. My computer has 16 physical cores, and it took ~420s to finish running with 4 parallel chains and 4 threads per chain. I guess it’s probably due to the for-loop to calculate the matrix multiplication for each observation in partial_sum function, which is not very efficient.

The reduce_sum version is as follows:

``````functions {
real partial_sum_lpmf(int[] y, int start, int end, int[] sub, matrix dv, vector[] beta) {
int N = end - start + 1;
vector[N] dv_x_beta;
for (n in 1:N) {
int nn = n + start - 1;
dv_x_beta[n] = dv[nn] * beta[sub[nn]];
}
return bernoulli_logit_lupmf(y | dv_x_beta);
}
}

data {
int<lower=1, upper = 2> n_group;
int<lower=1> n_sub;
int<lower=1> n_dv;
int<lower=1> obs_start[n_sub];
int<lower=1> obs_end[n_sub];
int<lower=1> n_obs_total;
int<lower=1, upper=2> group[n_sub];
int<lower=1> sub[n_obs_total]; # subject id for each observation
int<lower=0, upper=1> y[n_obs_total];
matrix[n_obs_total,n_dv] dv;
int<lower=1> grainsize; # set to 1 through data
}

parameters {
vector[n_dv] mu_grp[n_group];
vector<lower=0>[n_dv] sigma_grp[n_group];
vector[n_dv] beta[n_sub];
}

model {
for (g in 1:n_group) {
mu_grp[g] ~ std_normal();
sigma_grp[g] ~ gamma(2, 0.1);
}

for (s in 1:n_sub) {
beta[s] ~ student_t(3, mu_grp[group[s]], sigma_grp[group[s]]);
}

target += reduce_sum(partial_sum_lupmf, y, grainsize, sub, dv, beta);
}

generated quantities {
vector[n_dv] mu_grp_diff;
vector[n_dv] sigma_grp_diff;

mu_grp_diff = mu_grp[1] - mu_grp[2];
sigma_grp_diff = sigma_grp[1] - sigma_grp[2];
}
``````

Any suggestions or good ideas?

Thanks!

Presumably this is something you need to run many times for many datasets?

Take a look at the tricks I used here; you’ll have to add the group structure but otherwise I think the reduced-redundant-computation and sufficient-stats tricks should give you substantial speed ups.

Oh, I just noticed that that demo doesn’t Include use of rows_dot_product, which I recently learned provides an additional speed up. Lemme edit it…

Oh, I should ask: is there much redundancy in the contents of the dv matrix? For example, if it encodes contrasts for a categorical predictor there will be few unique entries, hence high redundancy and the tricks I mention can apply. If it’s a numeric variable with completely unique values for each entry, the tricks won’t work (and will probably slow things down).

This model actually just runs once for the only one dataset (containing the observations of 53 participants), though I have other expanded models based on this simplest one.

There is definitely redundancy in the dv matrix just like you pointed out there are contrasts for a categorical predictor. There are some numeric variables as well, but the values are not completely unique but in a limited range. Here’s a part of the data; maybe it could help the discussion.
data.csv (19.9 KB)

The dv matrix contains variables from `kConst` to `online_cost`, `stop` is the binary dependent variable. `kConst` is the intercept, `cost0_1` and `cost0_4` are the contrasts encoded for a three-level categorical variable (0.0, 0.1, and 0.4). `draw` is an integer variable ranging from 0 to 20 (so only 21 levels), and `online_draw` is actually the interaction between/product of the three-level categorical variable and `draw` so that it takes on a limited range of values too.

So for the current data, I guess these tricks would apply? But if there is a continuous variable that is like completely random (as it’s the case for my other more complex models), these tricks would not work?

I took a look at the sufficient statistic trick and tried to turn the logistic regression into binomial regression. It’s about 9 times faster! This is what I did (and I hope I did it right): I counted the unique combination of the values of decision variables and the times of `stop == 1` per combination for each participant such that the total number of observations was reduced to only 2491 entries. Here is the example data:
data_binom.csv (1.8 KB) where `n_stopped` is the number of times of which subject chose to stop and `n_total` is the total times of which subject had to choose stop or not.

And this is the revised model:

``````data {
int<lower=1, upper = 2> n_group;
int<lower=1> n_sub;
int<lower=1> n_dv;
int<lower=1> obs_start[n_sub];
int<lower=1> obs_end[n_sub];
int<lower=1> n_obs_total;
int<lower=1, upper=2> group[n_sub];
int<lower=1> sub[n_obs_total];
int<lower=0> n_stopped[n_obs_total]; # times of happened events
int<lower=0> n_total[n_obs_total]; # the total times
matrix[n_obs_total,n_dv] dv;
}

parameters {
vector[n_dv] mu_grp[n_group];
vector<lower=0>[n_dv] sigma_grp[n_group];
vector[n_dv] beta[n_sub];
}

model {
vector[n_obs_total] dv_x_beta;

for (g in 1:n_group) {
mu_grp[g] ~ std_normal();
sigma_grp[g] ~ gamma(2, 0.1);
}

for (s in 1:n_sub) {
beta[s] ~ student_t(3, mu_grp[group[s]], sigma_grp[group[s]]);
dv_x_beta[obs_start[s]:obs_end[s]] = dv[obs_start[s]:obs_end[s]] * beta[s];
}

n_stopped ~ binomial_logit(n_total, dv_x_beta);

}

generated quantities {
vector[n_dv] mu_grp_diff;
vector[n_dv] sigma_grp_diff;

mu_grp_diff = mu_grp[1] - mu_grp[2];
sigma_grp_diff = sigma_grp[1] - sigma_grp[2];
}
``````
1 Like