I was playing around with a simple binomial regression model verifying parameter recovery for some simulated data at different sample sizes and there’s a clear relationship between that and fit wall time. I haven’t thoroughly examined that but I would assume its O(n)?

In that case, can it be reasonably said that using too large a sample could be overkill even if available? Would taking a subsample be a recommended approach when it provides estimates that are accurate enough?

This is model I’m using:

```
data {
int<lower=1> N; // sample size (data.frame rows)
int<lower=1> Kx; // number of covariates for mean
int<lower=1> n[N]; // # of attempts (binomial parameter)
int<lower=0> y[N]; // # of successes (outcome)
matrix[N, Kx] x; // covariate matrix for mean
}
parameters {
vector[Kx] bx; // coeffs for beta mean
}
model {
real mu_beta;
for (i in 1:N) {
mu_beta = inv_logit(x[i]*bx);
target +=
log(
exp( binomial_lpmf(y[i] | n[i], mu_beta) )
);
}
}
```

I realize the last loop where the LL is being incremented is very inefficient for this toy model but in my real application I’m using a mixture to introduce 0 and n inflation on the binomial, which looks like this:

```
target +=
log(
ymin[i]*p[1] + ymax[i]*p[3] + p[2]*exp( beta_binomial_lpmf( y[i] | n[i], mu_beta/rho, (1-mu_beta)/rho ) )
);
```

So it would remain an issue down the line, unless there’s a better way to specify the above.