Code takes too long to run, despite small dataset size

That’s more iterations than you should need for most well-behaved problems. When you see that every transition exceeds max tree depth, there are bigger problems.

With your data size, the real problem is probably that you’re using a centered parameterization. You can change that to non-centered this way:

real<offset=gamma, multiplier=tau> alpha;
vector<offset=gamma, multiplier=tau>[K] beta;

Usually the intercepts get wider priors that are unrelated to the slopes. But I just rewrote your model. The offset/multiplier don’t change the target density, just the geometry over which sampling takes place.