I have posted a few times here about this model and received great support to get the model to this point. Unfortunately, I’m still having trouble with it. **R/data file included at bottom**. I have estimated such models using Stan before (in fact I have a publication on it). The model is slightly different here and the dataset much larger.

I started from the following **brms model**:

```
brm(data = mod[1:1000,],
family = categorical(link = logit, refcat = NA),
bf(trptrans ~ 1,
nlf(mu1 ~ btc * (b1tt * travelTimeDrive + travelCostDrive)),
nlf(mu2 ~ btc * (a2 + b2tt * travelTimeWalk)),
nlf(mu3 ~ btc * (a3 + b3tt * travelTimeBike)),
nlf(mu4 ~ btc * (a4 + b4tt * travelTimeTransit)),
mvbind(a2+ a3 + a4 + btc + b1tt + b2tt + b3tt + b4tt) ~ 1 + (1 + race_2 + race_3 + race_4 + race_5 + hhfaminc|personid) + (1|trippurp),
nl = TRUE),
prior = c(prior(normal(0, 2.5), class = b, nlpar = a2),
prior(normal(0, 2.5), class = b, nlpar = a3),
prior(normal(0, 2.5), class = b, nlpar = a4),
prior(normal(0, 2.5), class = b, nlpar = b1tt),
prior(normal(0, 2.5), class = b, nlpar = btc),
prior(normal(0, 2.5), class = b, nlpar = b2tt),
prior(normal(0, 2.5), class = b, nlpar = b3tt),
prior(normal(0, 2.5), class = b, nlpar = b4tt)),
algorithm = "meanfield")
```

My dataset has about **178,000 rows**, which I filter in the above code `mod[1:1000,]`

for testing. I adjusted my Stan code from that generated by brms to be more vectorized and adjust the specification for `btc, b2tt, b3tt, b4tt`

to be exponentiated (so they are lognormal distributed). It was recommended I run the model on a GPU because I have a large dataset. This doesn’t give a huge improvement because `categorical_logit_glm_lupmf`

requires me to loop over the N rows and the GPU isn’t highly utilized.

I simplified the model as a next step to remove correlations and contextual variables (race_2, race_3, race_4, race_5, and hhfaminc). I ran the model on VI for the full dataset. On the CPU, gradient evaluation took 1.08 seconds. On the GPU, it took 0.92 seconds. I get very **large ELBO values** (e.g., -2783111885420583704669702329161101097016063682459208926108713371041792.000) and a divergence message. When I run the above brms code with VI, it **runs fine** (gradient evaluation takes 0.0088 seconds for 1000 records or 1.52 seconds with the full dataset) and stochastic ascent runs with reasonable ELBO values.

I thought part of the problem could be the priors combined with the `exp()`

exploding the initial values. I reduced the prior mean from **2.5 to 0.5**, but it doesn’t have an appreciable effect on the ELBO.

If I can get this model giving reasonable results (in a reasonable runtime), then I can add back the contextual variables and some of the correlation. **I have two questions/focuses**:

- Any suggestions on how to
**address the divergence/large ELBO**? - The
**GPU is underutilized**, so not being optimally used (using a Tesla k20 but have access to a V100). I read it might be possible to**combine**(Any intuitions/results on when GPU>reduce_sum? - #5 by saudiwin). I thought this might increase GPU utilization because it would pass multiple rows at once. Is this a possibility? I have a version of the code written using`reduce_sum`

and GPU`reduce_sum`

.

```
data {
int<lower=1> N; // total number of observations
int<lower=2> ncat; // number of categories
int<lower=2> nvar; // number of variables for each alternative
int<lower=2> ntotvar; // number of total variables
array[N] int Y; // response variable
// covariate matrix for alt1
array[N] row_vector[nvar] x1; // matrix[N,nvar] x1;
// covariate matrix for alt2
array[N] row_vector[nvar] x2; // matrix[N,nvar] x2;
// covariate matrix for alt3
array[N] row_vector[nvar] x3; // matrix[N,nvar] x3;
// covariate matrix for alt4
array[N] row_vector[nvar] x4; // matrix[N,nvar] x4;
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
// matrix[N, M_1] Z_1;
vector[N] Z_1;
int<lower=1> NC_1; // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation
int prior_only; // should the likelihood be ignored?
}
transformed data {
int<lower=1> M_3 = (M_1+M_2)*(ncat*2); // total number of sd
}
parameters {
real b_btc; // population-level effects
real b_a2; // population-level effects
real b_a3; // population-level effects
real b_a4; // population-level effects
real b_b1tt; // population-level effects
real b_b2tt; // population-level effects
real b_b3tt; // population-level effects
real b_b4tt; // population-level effects
vector<lower=0>[M_3] sd; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector[N_2] z_2[M_2]; // standardized group-level effects
vector[N_1] z_3[M_1]; // standardized group-level effects
vector[N_2] z_4[M_2]; // standardized group-level effects
vector[N_1] z_5[M_1]; // standardized group-level effects
vector[N_2] z_6[M_2]; // standardized group-level effects
vector[N_1] z_7[M_1]; // standardized group-level effects
vector[N_2] z_8[M_2]; // standardized group-level effects
vector[N_1] z_9[M_1]; // standardized group-level effects
vector[N_2] z_10[M_2]; // standardized group-level effects
vector[N_1] z_11[M_1]; // standardized group-level effects
vector[N_2] z_12[M_2]; // standardized group-level effects
vector[N_1] z_13[M_1]; // standardized group-level effects
vector[N_2] z_14[M_2]; // standardized group-level effects
vector[N_1] z_15[M_1]; // standardized group-level effects
vector[N_2] z_16[M_2]; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1; // actual group-level effects
vector[N_2] r_2; // actual group-level effects
vector[N_1] r_3; // actual group-level effects
vector[N_2] r_4; // actual group-level effects
vector[N_1] r_5; // actual group-level effects
vector[N_2] r_6; // actual group-level effects
vector[N_1] r_7; // actual group-level effects
vector[N_2] r_8; // actual group-level effects
vector[N_1] r_9; // actual group-level effects
vector[N_2] r_10; // actual group-level effects
vector[N_1] r_11; // actual group-level effects
vector[N_2] r_12; // actual group-level effects
vector[N_1] r_13; // actual group-level effects
vector[N_2] r_14; // actual group-level effects
vector[N_1] r_15; // actual group-level effects
vector[N_2] r_16; // actual group-level effects
// compute actual group-level effects
r_1 = (sd[1] * (z_1[1]));
r_2 = (sd[2] * (z_2[1]));
// compute actual group-level effects
r_3 = (sd[3] * (z_3[1]));
r_4 = (sd[4] * (z_4[1]));
// compute actual group-level effects
r_5 = (sd[5] * (z_5[1]));
r_6 = (sd[6] * (z_6[1]));
// compute actual group-level effects
r_7 = (sd[7] * (z_7[1]));
r_8 = (sd[8] * (z_8[1]));
// compute actual group-level effects
r_9 = (sd[9] * (z_9[1]));
r_10 = (sd[10] * (z_10[1]));
// compute actual group-level effects
r_11 = (sd[11] * (z_11[1]));
r_12 = (sd[12] * (z_12[1]));
// compute actual group-level effects
r_13 = (sd[13] * (z_13[1]));
r_14 = (sd[14] * (z_14[1]));
// compute actual group-level effects
r_15 = (sd[15] * (z_15[1]));
r_16 = (sd[16] * (z_16[1]));
}
model {
// likelihood including constants
if (!prior_only) {
// Define matrices/vector for x, alpha, beta
matrix[ncat,nvar] x;
matrix[nvar,ncat] beta;
vector[ncat] alpha;
array[N] row_vector[ncat] a = rep_array([0,b_a2,b_a3,b_a4], N);
// initialize linear predictor term
// Terms are btc, b1tt, b2tt, b3tt, b4tt
array[N] row_vector[ntotvar] b = rep_array([b_btc,b_b1tt,b_b2tt,b_b3tt,b_b4tt], N); // matrix[N,ntotvar] b = rep_matrix(b_btc,N,ntotvar);
for (n in 1:N) {
// add to linear predictor term
a[n] += [0,r_3[J_1[n]] + r_4[J_2[n]], r_5[J_1[n]] + r_6[J_2[n]], r_7[J_1[n]] + r_8[J_2[n]]];
// add to linear predictor term
// Terms are btc, b1tt, b2tt, b3tt, b4tt
b[n] += [r_1[J_1[n]] + r_2[J_2[n]], r_9[J_1[n]] + r_10[J_2[n]], r_11[J_1[n]] + r_12[J_2[n]],r_13[J_1[n]] + r_14[J_2[n]],r_15[J_1[n]] + r_16[J_2[n]]];
b[n] = exp(b[n]);
// Each x and beta is a matrix with dimensions alts x variables
// Our x will be the time/cost coming in as inputs
x = to_matrix([x1[n],x2[n],x3[n],x4[n]]);
// Our betas will be the hierarchical slope parameters (2 x 4)
beta = [b[n,1] * b[n,2:5], rep_row_vector(b[n,1],ncat)];
// Our alphas will be the hierarchical intercept parameters
alpha = [a[n,1], b[n,1] * a[n,2], b[n,1] * a[n,3], b[n,1] * a[n,4]]';
target += categorical_logit_glm_lupmf(Y[n] | x, alpha, beta);
}
}
// priors
target += normal_lpdf(b_btc | 0, 0.5);
target += normal_lpdf(b_a2 | 0, 2.5);
target += normal_lpdf(b_a3 | 0, 2.5);
target += normal_lpdf(b_a4 | 0, 2.5);
target += normal_lpdf(b_b1tt | 0, 0.5);
target += normal_lpdf(b_b2tt | 0, 0.5);
target += normal_lpdf(b_b3tt | 0, 0.5);
target += normal_lpdf(b_b4tt | 0, 0.5);
target += student_t_lpdf(sd | 3, 0, 2.5);
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
target += std_normal_lpdf(z_3[1]);
target += std_normal_lpdf(z_4[1]);
target += std_normal_lpdf(z_5[1]);
target += std_normal_lpdf(z_6[1]);
target += std_normal_lpdf(z_7[1]);
target += std_normal_lpdf(z_8[1]);
target += std_normal_lpdf(z_9[1]);
target += std_normal_lpdf(z_10[1]);
target += std_normal_lpdf(z_11[1]);
target += std_normal_lpdf(z_12[1]);
target += std_normal_lpdf(z_13[1]);
target += std_normal_lpdf(z_14[1]);
target += std_normal_lpdf(z_15[1]);
target += std_normal_lpdf(z_16[1]);
}
}
```

time_cost_calculated.CSV (7.3 MB)

mtc_equity.R (2.0 KB)