Unfortunately the topics being discussed here are pretty subtle and I’m not sure if you’ll be able to explain what’s going on here without getting into at least some technical detail. In particular we have

**Inference**: quantifications of model configurations that are consistent with the observed data (at least with the scope of the model).

**Calibration**: quantification of how inferences behave as the observation is varied.

Within each of these we have frequentist and Bayesian variants (for more detail see for example Section 2 and 3 of https://betanalpha.github.io/assets/case_studies/modeling_and_inference.html),

**Frequentist Inference**: Deterministic functions of the observed data that identify one or more model configurations. When well-engineered these model configurations may be related to model-consistency with the observed data. Otherwise known as *estimators*.

**Frequentist Calibration**: The worst case performance of a frequentist estimator over possible observations within the context of a given model. More formally the *worst case expected loss* of a given model-based loss function.

**Bayesian Inference**: A probability distribution, as summarized through posterior expectation values, that weights the model configurations by how consistent they are with both the observed data, as defined by the observational model, and domain expertise, as defined in part by the prior model. Otherwise known as a *posterior distribution* and *posterior expectation values*.

**Bayesian Calibration**: A distribution of possible posterior distribution behaviors over possible observations within the context of a given model. If a model-based utility function is defined then this often takes the form of the distribution of possible utility outcomes or summaries thereof such as average utilities.

Asymptotics, and calibration defined only in the asymptotic limit, only confuses the matter further!

The problem with comparing a confidence interval and a credible interval is that the former is defined with respect to its calibration (frequentist coverage is worst case loss of interval estimators under an inclusion loss function) whereas the latter is a pure inference. We can’t compare calibration to calibration because the credible interval often hasn’t been calibration, and even if it had been frequentist and Bayesian calibrations are fundamentally different (one worst case over the model configurations one distributional over the model configurations).

I think one can communicate the different goals – calibration verses inferences – in a relatively terse, straightforward way but to go any further you either have to get superficial and abstract away important details or spend some additional time laying out those details in preparation.