My language was really confusing here.
I think the main advantage of gradient-free methods will be precisely in integrating packages without derivatives that are way too big to code in Stan itself. The biggest drawback is that the derivative free methods I know don’t scale well with dimension.
I meant black box only in that we don’t know what the algorithm does because someone external hooks it up as a component in a model log density. The gradient-free MCMC packages like emcee (Python), let you define arbitrary log density functions, so those can include calls to external packages without gradients. A lot of big physical models have this character. For instance, when we were talking to Phil Price about HVAC and energy modeling, for which there are a lot of big packages out there without derivatives.
I also think it’s fine to support third-party plug-ins for Stan, commercial or not, as long as we don’t do something ourselves that affects our licensing (shouldn’t be a problem as we wouldn’t be distributing the plug-ins and licensing would be a user issue).