I’m not sure I could visualize your matrix structure correctly from your example, but I did something that may have been similar, with a MN \times MN matrix with N \times N blocks where the within-block correlation was given by a kernel with M block-specific parameters (or rather a combination of all possible pairs of parameters for its corresponding block column/row) and between-block variance parameters were unconstrained – so there were \binom M 2 there plus the diagonal elements.

For M=2, N=2 it would look something like this:

K = \begin{bmatrix} \sigma^2_{11} \begin{bmatrix} k_{11}(x_{11},x_{11}) & k_{11}(x_{11},x_{12}) \\ k_{11}(x_{12},x_{11}) & k_{11}(x_{12},x_{12}) \end{bmatrix} \sigma^2_{12} \begin{bmatrix} k_{12}(x_{11},x_{21}) & k_{12}(x_{11},x_{22}) \\ k_{12}(x_{12},x_{21}) & k_{12}(x_{12},x_{22}) \end{bmatrix} \\ \sigma^2_{21} \begin{bmatrix} k_{21}(x_{21},x_{11}) & k_{21}(x_{21},x_{12}) \\ k_{21}(x_{22},x_{11}) & k_{21}(x_{22},x_{12}) \end{bmatrix} \sigma^2_{22} \begin{bmatrix} k_{22}(x_{21},x_{21}) & k_{22}(x_{21},x_{22}) \\ k_{22}(x_{22},x_{21}) & k_{22}(x_{22},x_{22}) \end{bmatrix} \end{bmatrix}

I’d say possibly, but that is dependent on the size/difficulty of the problem, and unless you can impose a parametric structure (like the identity, or kernel for within-blocks) or something else that constrains that in a way that makes sense you are left with estimating each parameter separately – this is what I did, although it becomes a quite large number even for moderate number of blocks.

That’s a great question I could never figure out completely. Because you need covariance matrices to be positive definite to be able to compute the likelihood if the proposed values are not it would result in an invalid likelihood value for the proposal and it would be rejected. For Metropolis-Hastings that would probably just slow down things when the random proposals ended up in that region, but for HMC where the proposal relies on navigating the geometry of the likelihood surface its combination with positive definiteness criterion is harder to visualize (to me at least); if it turns out to be a sharp boundary may the positive definiteness would just fence off a region of parameter space and I guess HMC trajectories would navigate within that boundary, but if it happens to be a series of pits with likelihood zero it could cut short otherwise valid trajectories and make it very inefficient.

Maybe others here will have a better intuition for what this kind of constraint entails for the HMC potential and the actual performance. In practice I would just run it and see what the samples look like and go from there, all of that may be trying to solve a problem that is not there.