This model fits well for smaller datasets, but as the number of clusters and observations grow, the time required to fit is growing non-linearly. **Is this a fault of my model?**

I am trying to fit count data for N subjects, where subjects are broken into C

distinct clusters (C << N). My model is a multi-level, zero-inflated mixture of two negative binomials (i.e., neg_binomial_2_lpmf). Some clusters have very little data, while others have lots, so I want to take advantage of pooling between clusters.

```
data {
int<lower=1> N; //number of observations
int<lower=1> C; //number of groups/clusters
real<lower=0> meanF; //mean freq across all
int<lower=0> Y[N]; //outcome variable
int<lower=1> G[N];
}
parameters {
real<lower=0,upper=1> zinf[C]; // Zero inflation probability
real<lower=0,upper=1> theta[C]; // Mix weight for 2x Neg. Binom. mixture
positive_ordered[2] phi[C]; // Size (phi) parameter for Neg. Binom.
positive_ordered[2] mu[C]; // Mean (mu) parameter for Neg. Binom.
}
model {
for (i in 1:C) {
phi[i,] ~ exponential(1) ; // Phi prior - positive, but close to 0
mu[i,] ~ lognormal(meanF, 100) ; // Mu prior - lognormal captures both modes
zinf[i] ~ beta(2,2) ; // Prob that Y[n] is inflated 0
theta[i] ~ beta(2,2) ; // Prob that Y[n] comes from 1st NegBinomial
}
for (n in 1:N) {
int d = G[n] ;
if ( Y[n] > 0 ) target += log1m(zinf[d]) +
log_mix ( theta[d],
neg_binomial_2_lpmf( Y[n] | mu[d,1] , phi[d,1] ),
neg_binomial_2_lpmf( Y[n] | mu[d,2] , phi[d,2] )
);
if ( Y[n] == 0 ) target += log_mix( zinf[d], 0,
log_mix ( theta[d],
neg_binomial_2_lpmf( 0 | mu[d,1] , phi[d,1] ),
neg_binomial_2_lpmf( 0 | mu[d,2] , phi[d,2] )
)
);
};
}
```

Called with:

options(mc.cores = parallel::detectCores())

ZInbn2fit ← stan( model_code = ZInbn2Model,

data = ZInbn2Data,

iter = 3000, warmup=1000, chains = 2,

control=list(adapt_delta=0.95))

Both ‘mu’ and ‘phi’ are positive_ordered so the model will identify.

Stan identifies all parameters well and without errors, but it takes forever in practice. In tests with 2 clusters and 2000 observations (split in various proportions between clusters) the model fits in under a minute (after compilation). But actual data with 15 clusters and about 15,000 observations takes ~4 hours (e.g., 1000 transitions using 10 leapfrog steps per transition would take 139.58 seconds.).

Some datasets have over 200 clusters and 200K observations. Since I cannot control the number of clusters, and I’d prefer not to lose any data by sampling observations, I’m hoping either my model code or stan(…) parameters may be optimized somehow. Any ideas are appreciated.

MacBook Pro 2019, 2.3GHz 8-Core Intel Core i9, 16GB RAM

RStudio v.1.4.1106

R version 3.6.2 (2019-12-12) – “Dark and Stormy Night”

rstan (Version 2.21.3, GitRev: 2e1f913d3ca3)

Thanks in advance.