# Diminishing returns on wall time vs sample size / inefficient way of writing mixture model?

I was playing around with a simple binomial regression model verifying parameter recovery for some simulated data at different sample sizes and there’s a clear relationship between that and fit wall time. I haven’t thoroughly examined that but I would assume its O(n)?

In that case, can it be reasonably said that using too large a sample could be overkill even if available? Would taking a subsample be a recommended approach when it provides estimates that are accurate enough?

This is model I’m using:

``````data {
int<lower=1> N; // sample size (data.frame rows)
int<lower=1> Kx; // number of covariates for mean

int<lower=1> n[N]; // # of attempts (binomial parameter)
int<lower=0> y[N]; // # of successes (outcome)

matrix[N, Kx] x; // covariate matrix for mean
}

parameters {
vector[Kx] bx; // coeffs for beta mean
}

model {
real mu_beta;

for (i in 1:N) {
mu_beta = inv_logit(x[i]*bx);

target +=
log(
exp( binomial_lpmf(y[i] | n[i], mu_beta) )
);
}
}
``````

I realize the last loop where the LL is being incremented is very inefficient for this toy model but in my real application I’m using a mixture to introduce 0 and n inflation on the binomial, which looks like this:

``````target +=
log(
ymin[i]*p + ymax[i]*p + p*exp( beta_binomial_lpmf( y[i] | n[i], mu_beta/rho, (1-mu_beta)/rho ) )
);
``````

So it would remain an issue down the line, unless there’s a better way to specify the above.