I’m not sure if this is really the best way to do it but I went back to the model without uncertainties and just changed the bits down below. I think this is essentially what the R code does and I get results that are almost exactly the same as what is reported in the paper. I had to put in initial values though, and from comments on the forum it seems like that isn’t really the best way to go. I’ll try to spend some more time figuring out how to improve this.
Thank you so much @nhuurre for all the help and for being so patient with this newbie. :)
parameters {
    real<lower=0., upper=1.> p_gamma; 
    real<lower=1., upper=200.> p_phi0;
    real<lower=-0.5, upper=1.> p_beta;
    real<lower=max(to_vector([3., p_beta * (2. - p_gamma) + p_gamma / 2.])), upper=10.> p_alpha; 
    vector<lower=min(pmra_true) - max(pmra_err), upper=max(pmra_true) + max(pmra_err)>[N] pmra;
    vector<lower=min(pmdec_true) - max(pmra_err), upper=max(pmdec_true) + max(pmdec_err)>[N] pmdec;
    vector<lower=-300, upper=300>[N] vlos;
    vector<lower=0, upper=max(r_true) + max(r_err)>[N] r;
} 
transformed parameters {
    real<lower=0> y[N, 3]; 
    for (i in 1:N) {
        vector[3] vels_sph = transform_vels(ra[i], dec[i], pmra[i], pmdec[i], vlos[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]); 
        y[i, 2] = sqrt(square(vels_sph[2]) + square(vels_sph[3]));
        y[i, 1] = sqrt(square(vels_sph[1]) + square(y[i, 2]));
        y[i, 3] = r[i];
    }
}
model {
    for (i in 1:N) {
        pmra_true[i] ~ normal(pmra[i], pmra_err[i]);
        pmdec_true[i] ~ normal(pmdec[i], pmdec_err[i]);
        vlos_true[i] ~ normal(vlos[i], vlos_err[i]);
        r_true[i] ~ normal(r[i], r_err[i]);
    }
    p_gamma ~ normal(0.5, 0.06);
    p_phi0 ~ uniform(1., 200);
    p_beta ~ uniform(-0.5, 1.);
    p_alpha ~ shifted_gamma(2.99321604, 2.82409927, 3.);
    
    for (i in 1:N) {
        y[i] ~ df(p_phi0, p_gamma, p_beta, p_alpha);
    }
    
}
This takes ~45 minutes to run on 12 cores with a maximum treedepth of 12 and adapt_delta=0.99. I get <1% divergent iterations, ~33% iterations exceeding treedepth, and n_eff of a few thousand out of 12000 on all parameters.