Looks tricky! The only thing I feel at least somewhat qualified to comment on is the HMM part, but since nobody else answered, I will give it a try:
That’s actually not a big limitation - you can have time-varying transition probabilities in a HMM, all you AFAIK need for the forward algorithm to work is to be able to construct the transition probabilities matrix at any discrete time point you need (see e.g. Are time-varying transition probabilities supported by the new hmm_* functions?).
So I think a non-terrible HMM approximation to the process would be constructed by discretizing H into K bins with bounds b_0, ... b_K, while keeping C continuous. This would turn the deterministic continous dynamics of H into stochastic discrete dynamics over \hat H and V \in \{ \mathrm{on, off}\} - for each bin of H values at discrete time i you would have transition probabilities P(\hat H_i = r, V_i = u | \hat H_{i - 1} = s, V_{i-1} = v) equal / approximate the proportion of points in the cube i - 0.5 < t_1 < i + 0.5, i + 0.5 < t_2 < i + 1.5, b_{r - 1} < h < b_{r} such that the continous dynamics starting at time t_1 with H(t_1) = h will lead to b_{s - 1} < H(t_2) < b_{s}. I.e. we assume that within each bin of time and \hat H we believe in uniform distribution of the continuous values. Even if the integrals would be analytically tractable (which they may not be), one would still probably want to allow some wiggle room so that whenever exact simulation of the dynamics give non-zero probability to a bin, you also give small non-zero probability to its neighbours to allow the HMM to not be completely thrown off the rails by any artifacts of the simulation.
One could probably achieve a good approximation by just simulating the trajectory from the midpoint of each bin and then look at the probabilities that the result of this exact simulation + some noise term falls into each bin. (one could then estimate the noise term as part of the fitting process).
You could then play with the number of bins K to tradeoff (in)efficiency for fidelity. But I admit that solving the dynamics K^2 times for each timestep might be very costly even for very low K, especially if analytical formulations for the crossing points of H and C are not available and you would have to use numerical solvers.
Hope that helps at least a bit!