The dimensionality of \theta in prior p(\theta) and likelihood p(y | \theta) should match.

The answer’s almost always “no” with an adaptation scheme unless done very carefully (which almost never matches intuitions about what a good method would look like). Specifically, you need to prove that any MCMC algorithm you devise preserves the correct stationary distribution (usually the posterior but for Stan, always the log density defined by the Stan program). Fair warning—it’s not easy, which is why NUTS was such a breakthrough.

The easiest way to do that is through detailed balance. Guessing isnt’ a good strategy in this business. The usual approach is to start with Metropolis and learn why that satisfies detailed balance, then go onto Metropolis-Hastings and Gibbs. Then basic HMC is just an instance of Metropolis-Hastings. You can then look at some of the adaptive Metropolis algorithms, which go in the direction you’re asking about. For NUTS, the Hoffman and Gelman paper does a good job explaining all the steps required for maintaining detailed balance.