Efficient orthogonal matrix parameterization

Thanks Ben.

When I run with warmup=0, it takes very short time.
However, when warmup>=1, the time consumption increases super-linearly:

warmup=10, iter=20, adapt_delta=0.99, 4 chains in parallel 11.587s
warmup=100, iter=200, adapt_delta=0.99, 4 chains in parallel 1022.823s. Many parameters have Rhat>10.
warmup=1000, iter=2000, adapt_delta=0.99, 4 chains in parallel 7860.795s (about 131 minutes). All parameters have Rhat within (0.9992, 1.0068)

Does it indicate that my code is computational good but the model’s statistical efficiency can be improved?

For the final case (warmup=1000, iter=2000, adapt_delta=0.99, 4 chains in parallel), I strive to shorten the time, while keeping Rhat within 1.05 or 1.1. Do you think it a good idea to use less iterations (e.g. warmup=500, iter=1000) or smaller adapt_delta?

Thank you.