# Successfully implemented model, wondering if it can be optimized

I just implemented the following mixture model:

(small typo: there’s logit link for the mu)

From the manual, I know QR decomposition on the linear predictors can make the code more efficient. I’d appreciate it if someone could take a look at my model specification and tell me whether there’s additional room for improvement.

``````data {
int<lower=0> N;       // number of data points
int<lower=1> I;       // number of individual covariates
int<lower=1> G;       // number of group covariates

int<lower=1> A[N];    // number of attempts
int<lower=0> y[N];    // number of successes

vector[I] xi[N];      // individual covariates
vector[G] xg[N];      // group covariates
}
transformed data {
real nil[N];
real z1[N];
real z3[N];

nil = rep_array(0.5,N);
for (n in 1:N) {
z1[n] = (1-2*min([nil[n],1.0*y[n]]));
z3[n] = (1-2*min([nil[n],1.0*(A[n]-y[n])]));
}
}
parameters {
real<lower=0> phi;  // dispersion betabin

vector[I] beta;
matrix[2,G] gamma;
}
transformed parameters {
real<lower=0> rho;

rho = 1/phi;
}
model {
real mu;
vector[3] pz;

phi ~ normal(0, 10);
beta ~ normal(0, 3);
gamma[1] ~ normal(0, 3);
gamma[2] ~ normal(0, 3);

for (n in 1:N) {
mu = inv_logit(beta'*xi[n]);
pz = softmax([gamma[1]*xg[n],0,gamma[2]*xg[n]]');

target +=   log(
z1[n]*pz[1] + z3[n]*pz[3] +
pz[2]*exp(beta_binomial_lpmf(y[n]|A[n],mu*rho,(1-mu)*rho)));
}
}``````
• Don’t ever multiply by 1.0 — it doesn’t do anything
• No tabs, only spaces (otherwise it won’t display correctly, as you see in the original post)

I reformatted and rolled up a bit of the code so I could read it.

``````data {
int<lower=0> N;       // number of data points
int<lower=1> I;       // number of individual covariates
int<lower=1> G;       // number of group covariates
int<lower=1> A[N];    // number of attempts
int<lower=0> y[N];    // number of successes
vector[I] xi[N];      // individual covariates
vector[G] xg[N];      // group covariates
}
transformed data {
real nil[N] = rep_array(0.5, N);
real z1[N];
real z3[N];
for (n in 1:N) {
z1[n] = 1 - 2 * min([ nil[n], y[n] ]);
z3[n] = 1 - 2 * min([ nil[n], A[n] - y[n] ]);
}
}
parameters {
real<lower = 0> phi;
vector[I] beta;
matrix[2, G] gamma;
}
transformed parameters {
real<lower = 0> rho = inv(phi);
}
model {
phi ~ normal(0, 10);
beta ~ normal(0, 3);
gamma[1] ~ normal(0, 3);
gamma[2] ~ normal(0, 3);
for (n in 1:N) {
real mu = inv_logit(beta' * xi[n]);
vector[3] pz = softmax([gamma[1] * xg[n], 0, gamma[2] * xg[n] ]');
target += log(pz[1] * z1[n]
+ pz[2] * exp(beta_binomial_lpmf(y[n] | A[n], mu * rho, (1 - mu) * rho))
+ pz[3] * z3[n]);
}
}
``````

The main thing you want to do for efficiency is vectorize operations to reduce the burden on autodiff. For example, make `xi` into an `N x matrix, and calculate everyting you need up front with vectorized operations.

``````vector[I] beta;
matrix[N, I] xi;
vector[N] inv_logit_mu = inv_logit(xi * beta);
vector[N] y_alpha = rho * inv_logit_mu;
vector[N] y_beta = (1 - inv_logit_mu) * rho;
vector z_pz_1 = z1 * pz[1];
``````

and then use `inv_logit_mu[n]` in the loop. This is faster because matrix multiply can be done with better memory locality and less copying than epeated extract and transpose and vector multiply.

You might also want to put that last target incrment on the log-sum-exp scale, where it’d be

``````target
+= log_sum_exp({ log(pz[1]) + log(z[1]),
log(pz[2]) + beta_binomial_lpmf(y[n] | A[n], mu * rho, (1 - mu) * rho),
log(pz[3]) + log(z[3]));
``````

That might be more stable if otherwise the lpmf would round to zero or lose a lot of precision.

Then, there’s a more efficient and stable `log_softmax` function built in to calculate the `log(pz)` terms.

Finally, I’m not sure why you want to put a half normal prior on precision—that’s a zero avoiding prior because of the inversion (not that there’s anything intrinisically wrong with this, but we like to put priors on the scale because they’re easy to undertand and then use priors with support at zero because we often use them in hierarchical settings and want to know if the data’s consistent with complete pooling).

1 Like