The issue here is that the algebraic structure of covariance kernels means that you can define the “basis kernels” in a number of ways, including with or without multiplicative constants which for most of these kernels has the interpretation of a marginal variance or marginal standard deviation, depending on how you parameterize it.

The additive \sigma nominally comes from considering a GP prior and a Gaussian observation model with measurement variability \sigma in which case the kernel of the GP posterior becomes the sum of the prior kernel and the additive term. Consequently people often ignore it when talking about the possible kernels themselves. That said, in practice a small diagonal term is helpful for stabilizing calculations even when the GP is latent in the generative model, so it should definitely be added. In this case the additive term is known as a “nugget”.

I would recommend following the convention established with the `exp_quad`

kernel, where we include a nugget and a marginal standard deviation in addition to the kernel a length scale. In particular, it should be very easy switch between these two kernels without changing the hyperparameters.

Might also want to get @Daniel_Simpson’s opinion.