# Constraints on a transformation of multiple parameters

Hi,

I’m trying to fit a model but I can’t figure out how to implement a constraint on a variable which is some function of some parameters. Right at the top of the function `df_lpdf` I have this `E` variable, and I want it to be positive. I have tried defining it inside the transformed parameters block instead but the constraints there are only for checks, and not actual constraints. Do I need to reparameterize this, and if so, how can I do so? Right now the model sometimes runs and sometimes doesn’t because of bad starting values. If it runs then I’m getting `n_eff` < 100 from 1000 draws.

Also, is the way I currently have it set up the proper way to pass multiple inputs to a custom distribution function (the `real[] y` part)? It seems a little clunky but I couldn’t find any examples in the documentation.

Sorry if this is an obvious problem, this is my first time using Stan so I am unfamiliar with it.

Thanks! :)

``````functions {

real df_lpdf(real[] y, real phi0, real g, real b, real a) {

// E should be > 0
real E = -square(y[1]) / 2. + phi0 / pow(y[3], g);
real L = y[3] * y[2];

real num_t1 = -2 * b * log(L);
real num_t2 = (b * (g - 2.) / g + a / g - 3. / 2.) * log(E);
real num_t3 = lgamma(a / g - 2. * b / g + 1.);

real denom_t1 = 0.5 * log(8. * pow(pi(), 3.) * pow(2., -2. * b));
real denom_t2 = (-2. * b / g + a / g) * log(phi0);
real denom_t3 = lgamma(b * (g - 2.) / g + a / g - 1. / 2.);

real numerator = num_t1 + num_t2 + num_t3;
real denominator = denom_t1 + denom_t2 + denom_t3;

// this ~should~ never happen but it does.
if (E < 0.) {
return negative_infinity();
}

return numerator - denominator;
}

vector get_angles(real x, real y, real z) {
real projR = sqrt(square(x) + square(y));
real R = sqrt(square(x) + square(y) + square(z));

real sintheta = y / projR;
real costheta = x / projR;

real sinphi = projR / R;
real cosphi = z / R;

}

vector transform_vels(real ra, real dec, real pmra, real pmdec, real vlos, real plx, real sintheta, real costheta, real sinphi, real cosphi) {
matrix[3,3] T = [
[-0.06699, -0.87276, -0.48354],
[ 0.49273, -0.45035,  0.74458],
[-0.86760, -0.18837,  0.46020]
];
real deg2rad = pi() / 180.;
real sinra = sin(ra * deg2rad);
real cosra = cos(ra * deg2rad);
real sindec = sin(dec * deg2rad);
real cosdec = cos(dec * deg2rad);
matrix[3,3] A = [
[cosra * cosdec, -sinra, -cosra * sindec],
[sinra * cosdec, cosra , -sinra * sindec],
[sindec,              0,          cosdec]
];
matrix[3, 3] B = T * A;
real k = 4.74057;
vector[3] solarmotion = to_vector([11.1, 232.24, 7.25]);

vector[3] dat = to_vector([vlos, k * pmra / plx, k * pmdec / plx]);
vector[3] uvw = B * dat + solarmotion;

matrix[3, 3] ptz_mat = [
[costheta, sintheta, 0],
[-sintheta, costheta, 0],
[0, 0, 1]
];
vector[3] ptz = ptz_mat * uvw;

matrix[3, 3] rtp_mat = [
[sinphi, 0, cosphi],
[0, 1, 0],
[cosphi, 0, -sinphi]
];
vector[3] rtp = rtp_mat * ptz;

return rtp / 100;
}

}

data {

int<lower=1> N;

vector[N] ra;
vector[N] dec;
vector[N] plx;

vector[N] Xgc;
vector[N] Ygc;
vector[N] Zgc;

// measurements
vector[N] pmra_true;
vector[N] pmdec_true;
vector[N] vlos_true;
vector[N] r_true;

}

transformed data {
vector[N] sintheta;
vector[N] costheta;
vector[N] sinphi;
vector[N] cosphi;

real<lower=0> y[N, 3];

for (i in 1:N) {
vector[4] angles = get_angles(Xgc[i], Ygc[i], Zgc[i]);
sintheta[i] = angles[1];
costheta[i] = angles[2];
sinphi[i] = angles[3];
cosphi[i] = angles[4];
}

{
for (i in 1:N) {
vector[3] vels_sph = transform_vels(ra[i], dec[i], pmra_true[i], pmdec_true[i], vlos_true[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]);
y[i, 2] = sqrt(square(vels_sph[2]) + square(vels_sph[3]));
y[i, 1] = sqrt(square(vels_sph[1]) + square(y[i, 2]));
y[i, 3] = r_true[i];
}
}
}

parameters {
real<lower=0> p_phi0;
real<lower=0> p_gamma;
real<upper=1> p_beta;
real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha;
}

model {

// priors ----------------

p_phi0 ~ uniform(1., 200);
p_gamma ~ normal(0.5, 0.06);
p_beta ~ uniform(-0.5, 1.);
p_alpha ~ gamma(2.99321604, 2.82409927);

// likelihood ---------------------

for (i in 1:N) {
y[i] ~ df(p_phi0, p_gamma, p_beta, p_alpha);
}

}
``````

Yes, that’s the correct way to pass multiple inputs.

We don’t recommend uniform priors. What’s the motivation for 1 and 200 in `p_phi0~uniform(1,200)`? Would `p_phi~normal(0,200)T[0,]` work?

Anyway, looks to me that the constraint you need is

``````functions {
...
real phi_lower(real[,] y, real g) {
real p[size(y)];
for (i in 1:size(y)) {
p[i] = 0.5 * square(y[i,1]) * pow(y[i,3], g);
}
return max(p);
}
}
...
parameters {
real<lower=0> p_gamma;
real<lower=phi_lower(y, p_gamma)> p_phi0;
real<lower=0.5,upper=1> p_beta;
real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha;
}
``````

Thanks for the help! That constraint makes a lot of sense, and it works.

I’m trying to reproduce the results from a paper and `uniform(1, 200)` is what they use. I’ll try out your suggestion though and see what difference that makes.

@nhuurre I’m trying to add another layer to the model, and `y` has been moved to the `transformed_parameters` block. I think this means that `p_phi0` has to be moved there as well, because it doesn’t know about `y` when it’s in the `parameters` block. Now my problem is that the `transformed_parameters` block doesn’t enforce the constraints. How would I go about solving this? Thanks!

So, if I’m reading this right you transform the observed proper motion to galactic coordinates and `y[1]` is the speed in the galactic plane, `y[2]` is the the total speed and `y[3]` is some distance (from the galactic center?)
You didn’t explain what change requires `y` to be a transformed parameter but my first guess is that you’re modeling the uncertainty in proper motion by making `pmra_true` a parameter. If that’s the case then I think it would be easier to parametrize the velocity in galactic coordinates and calculate the proper motion as a transformed parameter. That way the constraints are more natural.
Secondly, what is the physical mechanism that enforces the constraint `E > 0`?
Is it an inviolable law with no exceptions, or a probabilistic one where exceptions are possible but very unlikely? If it’s the latter then consider omitting the constraint and modeling the mechanism directly.

The observed proper motions are transformed into spherical velocities in galactic coordinates (output of the `transform_vels` function) and then retransformed into `y` because the distribution function takes in `y`. `y[1]` is the total speed, `y[2]` is the tangential speed relative to the galactic center, and `y[3]` is the distance from the galactic center.

Yes, you’re totally right—I’m trying to incorporate uncertainties. I’ve included some code below, that probably explains things better than I can. The main change is that I have some `pmra_err` data coming in, and instead of transforming the `pmra_true` parameters to get the velocities, I use some new `pmra` parameter which is normally distributed around `pmra_true`. Then these velocities are put into `y` in the same way as before.

I don’t have a great understanding of this either, but `E` is the relative energy, and for `E<=0` (sorry this should be `<=` and not `<`) the distribution function should be 0. From what I can tell, there is an assumption that `p_phi0` is chosen such that the DF is 0 when `E<=0`, so the range of support of the DF is only for `E>0`. This is what I was trying to do with the `negative_infinity()` thing, which I guess is supposed to be `0`. I just tried it again with 0, and this also does not work (“Stan can’t start sampling from this initial value”).

``````functions {

real df_lpdf(real[] y, real phi0, real g, real b, real a) {

// y[1] = v, y[2] = v_t, y[3] = r;
real E = -square(y[1]) / 2. + phi0 / pow(y[3], g);
real L = y[3] * y[2];

// terms in numerator
real num_t1 = -2 * b * log(L);
real num_t2 = (b * (g - 2.) / g + a / g - 3. / 2.) * log(E);
real num_t3 = lgamma(a / g - 2. * b / g + 1.);

// terms in denominator
real denom_t1 = 0.5 * log(8. * pow(pi(), 3.) * pow(2., -2. * b));
real denom_t2 = (-2. * b / g + a / g) * log(phi0);
real denom_t3 = lgamma(b * (g - 2.) / g + a / g - 1. / 2.);

real numerator = num_t1 + num_t2 + num_t3;
real denominator = denom_t1 + denom_t2 + denom_t3;

// this ~should~ never happen
if (E < 0.) {
return 0.;
}

return numerator - denominator;
}

real phi_lower(real[,] y, real g) {
real p[size(y)];
for (i in 1:size(y)) {
p[i] = 0.5 * square(y[i,1]) * pow(y[i,3], g);
}
return max(p);
}

real shifted_gamma_lpdf(real y, real a, real b, real shift) {
real shifted_y = y - shift;
return a * log(b) - lgamma(a) + (a - 1) * log(shifted_y) - b * shifted_y;
}

vector get_angles(real x, real y, real z) {
real projR = sqrt(square(x) + square(y));
real R = sqrt(square(x) + square(y) + square(z));

real sintheta = y / projR;
real costheta = x / projR;

real sinphi = projR / R;
real cosphi = z / R;

}

vector transform_vels(real ra, real dec, real pmra, real pmdec, real vlos, real plx, real sintheta, real costheta, real sinphi, real cosphi) {
matrix[3,3] T = [
[-0.06699, -0.87276, -0.48354],
[ 0.49273, -0.45035,  0.74458],
[-0.86760, -0.18837,  0.46020]
];
real deg2rad = pi() / 180.;
real sinra = sin(ra * deg2rad);
real cosra = cos(ra * deg2rad);
real sindec = sin(dec * deg2rad);
real cosdec = cos(dec * deg2rad);
matrix[3,3] A = [
[cosra * cosdec, -sinra, -cosra * sindec],
[sinra * cosdec,  cosra, -sinra * sindec],
[        sindec,      0,          cosdec]
];
matrix[3, 3] B = T * A;
real k = 4.74057;
vector[3] solarmotion = to_vector([11.1, 232.24, 7.25]); // including rotation

vector[3] dat = to_vector([vlos, k * pmra / plx, k * pmdec / plx]);
vector[3] uvw = B * dat + solarmotion;

matrix[3, 3] ptz_mat = [
[ costheta, sintheta, 0],
[-sintheta, costheta, 0],
[        0,        0, 1]
];
vector[3] ptz = ptz_mat * uvw;

matrix[3, 3] rtp_mat = [
[sinphi, 0,  cosphi],
[     0, 1,       0],
[cosphi, 0, -sinphi]
];
vector[3] rtp = rtp_mat * ptz;

return rtp / 100;
}

}

data {

int<lower=1> N; // total length of data

vector[N] ra;
vector[N] dec;
vector[N] plx;

vector[N] Xgc;
vector[N] Ygc;
vector[N] Zgc;

// measurements
vector[N] pmra_true;
vector[N] pmdec_true;
vector[N] vlos_true;
vector[N] r_true;

// measurement uncertainties
vector[N] pmra_err;
vector[N] pmdec_err;
vector[N] vlos_err;
vector[N] r_err;

}

transformed data {
vector[N] sintheta;
vector[N] costheta;
vector[N] sinphi;
vector[N] cosphi;

for (i in 1:N) {
vector[4] angles = get_angles(Xgc[i], Ygc[i], Zgc[i]);
sintheta[i] = angles[1];
costheta[i] = angles[2];
sinphi[i] = angles[3];
cosphi[i] = angles[4];
}
}

parameters {
real<lower=0> p_gamma;
real<upper=1> p_beta
real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha;

vector<lower=min(pmra_true) - 5 * max(pmra_err), upper=max(pmra_true) + 5 * max(pmra_err)>[N] pmra;
vector<lower=min(pmdec_true) - 5 * max(pmdec_err), upper=max(pmdec_true) + 5 * max(pmdec_err)>[N] pmdec;
vector<lower=min(vlos_true) - 5 * max(vlos_err), upper=max(vlos_true) + 5 * max(vlos_err)>[N] vlos;
vector<lower=min(r_true) - 5 * max(r_err), upper=max(r_true) + 5 * max(r_err)>[N] r;
}

transformed parameters {

real y[N, 3]; // wrap up input data as (v, v_t, r)

// the constraint doesn't work here.
real<lower=phi_lower(y, p_gamma)> p_phi0; // scale factor for gravitational potential

{
for (i in 1:N) {
vector[3] vels_sph = transform_vels(ra[i], dec[i], pmra[i], pmdec[i], vlos[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]);
y[i, 2] = sqrt(square(vels_sph[2]) + square(vels_sph[3]));
y[i, 1] = sqrt(square(vels_sph[1]) + square(y[i, 2]));
y[i, 3] = r[i];

}
}

}

model {

for (i in 1:N) {
pmra[i] ~ normal(pmra_true[i], pmra_err[i]);
pmdec[i] ~ normal(pmdec_true[i], pmdec_err[i]);
vlos[i] ~ normal(vlos_true[i], vlos_err[i]);
r[i] ~ normal(r_true[i], r_err[i]);
}

p_gamma ~ normal(0.5, 0.06);
p_phi0 ~ normal(0., 150.)T[0, 200];
p_beta ~ uniform(-0.5, 1.);
p_alpha ~ shifted_gamma(2.99321604, 2.82409927, 3.);

for (i in 1:N) {
y[i] ~ df(p_phi0, p_gamma, p_beta, p_alpha);
}

}

generated quantities {
}
``````

Ah, makes sense. I mixed up `y[1]` and `y[2]` and didn’t follow details of the transform.

The correct representation of zero probability is `negative_infinity()` as `lpdf` is on the log-scale.

That’s what I thought.
But note that the “data generating process” goes in the other direction. The observed star has some physical velocity and when viewed from Earth the projection of that vector appears as proper motion. The proper motion is then measured with some error so that the recorded value is random number drawn from distribution `normal(pmra, pmra_err)`.

Stan allows you to model the observation process directly.

``````data {
// measurements
vector[N] pmra_true;
vector[N] pmdec_true;
vector[N] vlos_true;
// uncertainties
vector[N] pmra_err;
vector[N] pmdec_err;
vector[N] vlos_err;
...
}
parameters {
vector[3] vels_sph[N];
}
transformed parameters {
vector[N] pmra;
vector[N] pmdec;
vector[N] vlos;
for (i in 1:N) {
// the coordinate transform, basically transform_vels() backwards
vector[3] pm = inv_transform_vels(vels_sph, ...);
pmra[i] = pm[1];
pmdec[i] = pm[2];
vlos[i] = pm[3];
}
}
model {
for (i in 1:N) {
pmra_true[i] ~ normal(pmra[i], pmra_err[i]);
pmdec_true[i] ~ normal(pmdec[i], pmdec_err[i]);
vlos_true[i] ~ normal(vlos[i], vlos_err[i]);
}
}
``````

The advantage of this formulation is that now the tricky constraint is simply an upper bound on the length of the vector `vels_sph`.
Such a constraint can be implemented by a custom transform, for example

``````parameters {
vector[3] raw_vels[N];
}
transformed parameters {
vector[3] vels_sph[N];
for (i in 1:N) {
real max_vel = sqrt(2*p_phi0*pow(r[i], -g));
vels_sph[i] = max_vel * raw_vels[i]/sqrt(1+dot_self(raw_vels[i]));
}
}
model {
for (i in 1:N) {
// implies uniform density for vels_sph
real max_v2 = 2*p_phi0*pow(r[i], -g);
target += 1.5*log(max_v2) - 2.5*log1p(dot_self(raw_vels[i]));
}
}
``````

And finally you have

``````y[1] = sqrt(dot_self(vels_sph[i]));
y[2] = sqrt(dot_self(vels_sph[i,1:2]));
y[3] = r[i];
target += log(fabs(vels_sph[i,3])) - log(y[1]) - log(y[2]);
y ~ df(p_phi0, p_gamma, p_beta, p_alpha);
``````

The `target+=` statement is a Jacobian adjustment and it’s there because `df()` computes the density of `y` but we need to compute the density for the parameter `vels_sph`.

2 Likes

But note that the “data generating process” goes in the other direction.

Ok, this makes sense.

Sorry, I took a while to try to go through your solution but I’m having a hard time following. Here are a bunch of questions:

• We can’t put the constraint on `p_phi0`, so we instead put an equivalent constraint on the velocity (just rearrange the inequality). That’s where `max_vel` comes from. Why can we not use that as the upper bound for `raw_vels` instead of doing the whole `raw_vels` to `vels_sph` thing?

• I don’t understand the `raw_vels` thing and where the following two lines in your second code block came from:

`````` vels_sph[i] = max_vel * raw_vels[i]/sqrt(1+dot_self(raw_vels[i]));
``````
`````` target += 1.5*log(max_v2) - 2.5*log1p(dot_self(raw_vels[i]));
``````

I tried un-logging the second one and got `pow(max_vel, 3) * 1. / pow(sqrt(1. + dot_self(raw_vels[i])), 5)` which is not quite the first one cubed… ?

• Why can’t `pmra` just be a free parameter instead of being defined by the inverse transform?

• I don’t understand where the idea for the inverse transform came from. Is this because now that the constraint goes the other way, the transform also has to go the other way?

I know that you came up with this pretty quickly so it’s probably obvious to you, but if you could walk through your thought process in coming to this solution that would be really really helpful. Thanks.

Stan interpretes upper bound on a vector to mean upper bound on each component independently. So `vector<lower=-1,upper=1>[3]` would constrain the parameter to be inside a cube but for a length constraint it should be inside a ball. A custom constraint transform is needed.

The first line is some transform that restricts the length of the vector. The choice is somewhat arbitrary but the important thing is that any value of `raw_vels` gives a `vels_sph` whose length less than `max_vel`.

The second line is the log Jacobian determinant of the first line. The change of variables page I linked in the last post should explain the math behind it.
Now, Jacobians are tricky and I never get them right on the first try so it’s good idea to test it. Here’s what I did. The transform-only model is

``````parameters {
vector[3] raw;
}
transformed parameters {
vector[3] vel = raw / sqrt( 1 + dot_self(raw) );
}
model {
target += -2.5 * log1p(dot_self(raw));
}
``````

This model can be sampled from and if the transform is correct every `vel` vector must have length < 1. Also, because there is no prior or likelihood the samples must be distributed uniformly in the allowed region.

The upper plot shows samples in 3d coordinate system. The lower plot compares the number of samples within r of the center to the volume of r-radius ball. Because the number of samples is proportional to volume the distribution is uniform as expected.

If the Jacobian is wrong then either the distribution will be non-uniform or sampling will just fail.

It can be but it’s easier to express the constraints on `vels_sph`. `pmra` and `vels_sph` cannot both be parameters in the same model because that would imply independence; but they are coordinate transforms of each other.

Yes. In the original model `transform_vels()` takes (the known) proper motion and calculates the corresponding velocity in the spherical coordinates. In this model you hypothesize a value for the velocity and need calculate the predicted proper motion (which is then compared with the observed proper motion).

Feel free to ask more questions if anything is unclear.

1 Like

Ok, assuming that I did everything correctly (code posted below), this unfortunately doesn’t seem to work that well. Most of the iterations are divergent and I also end up with a lot of exceeded tree depths even with adapt_delta=0.999. The traceplots are flat. From what I’ve read, a reparameterization seems to be the way to go? I don’t know enough to be comfortable doing this so I’d like to avoid this option if possible.

Is there a way to have the constraint in the probability distribution function itself? As in keep trying new parameters if the function returns negative infinity? Doing this with the previous model (without uncertainties, and taking out the constraint) didn’t seem to work too well when I tried it though, even with “safe” initial values (big `p_phi0`).

What would you suggest?

Thanks.

``````functions {

real df_lpdf(real[] y, real phi0, real g, real b, real a) {

// y[1] = v, y[2] = v_t, y[3] = r;
real E = -square(y[1]) / 2. + phi0 / pow(y[3], g);
real L = y[3] * y[2];

// terms in numerator
real num_t1 = -2 * b * log(L);
real num_t2 = (b * (g - 2.) / g + a / g - 3. / 2.) * log(E);
real num_t3 = lgamma(a / g - 2. * b / g + 1.);

// terms in denominator
real denom_t1 = 0.5 * log(8. * pow(pi(), 3.) * pow(2., -2. * b));
real denom_t2 = (-2. * b / g + a / g) * log(phi0);
real denom_t3 = lgamma(b * (g - 2.) / g + a / g - 1. / 2.);

real numerator = num_t1 + num_t2 + num_t3;
real denominator = denom_t1 + denom_t2 + denom_t3;

// this ~should~ never happen
if (E < 0.) {
print("E: ", E, " v: ", y);
return negative_infinity();
}

return numerator - denominator;
}

real shifted_gamma_lpdf(real y, real a, real b, real shift) {
real shifted_y = y - shift;
return a * log(b) - lgamma(a) + (a - 1) * log(shifted_y) - b * shifted_y;
}

vector get_angles(real x, real y, real z) {
real projR = sqrt(square(x) + square(y));
real R = sqrt(square(x) + square(y) + square(z));

real sintheta = y / projR;
real costheta = x / projR;

real sinphi = projR / R;
real cosphi = z / R;

}

vector inv_transform_vels(vector vels_sph, real ra, real dec, real plx, real sintheta, real costheta, real sinphi, real cosphi) {
matrix[3,3] T = [
[-0.06699, -0.87276, -0.48354],
[ 0.49273, -0.45035,  0.74458],
[-0.86760, -0.18837,  0.46020]
];
real deg2rad = pi() / 180.;
real sinra = sin(ra * deg2rad);
real cosra = cos(ra * deg2rad);
real sindec = sin(dec * deg2rad);
real cosdec = cos(dec * deg2rad);
matrix[3,3] A = [
[cosra * cosdec, -sinra, -cosra * sindec],
[sinra * cosdec,  cosra, -sinra * sindec],
[        sindec,      0,          cosdec]
];
matrix[3, 3] B = T * A;
real k = 4.74057;
vector[3] solarmotion = to_vector([11.1, 232.24, 7.25]);

matrix[3, 3] ptz_mat = [
[ costheta, sintheta, 0],
[-sintheta, costheta, 0],
[        0,        0, 1]
];

matrix[3, 3] rtp_mat = [
[sinphi, 0,  cosphi],
[     0, 1,       0],
[cosphi, 0, -sinphi]
];

vector[3] rtp = vels_sph * 100.;
vector[3] ptz = rtp_mat \ rtp;
vector[3] uvw = ptz_mat \ ptz;
vector[3] dat = B \ (uvw - solarmotion);

real vlos = dat[1];
real pmra = dat[2] * plx / k;
real pmdec = dat[3] * plx / k;

}
}

data {

int<lower=1> N; // total length of data

vector[N] ra;
vector[N] dec;
vector[N] plx;

vector[N] Xgc;
vector[N] Ygc;
vector[N] Zgc;

// measurements
vector[N] pmra_true;
vector[N] pmdec_true;
vector[N] vlos_true;
vector[N] r_true;

// measurement uncertainties
vector[N] pmra_err;
vector[N] pmdec_err;
vector[N] vlos_err;
vector[N] r_err;

}

transformed data {
vector[N] sintheta;
vector[N] costheta;
vector[N] sinphi;
vector[N] cosphi;

for (i in 1:N) {
vector[4] angles = get_angles(Xgc[i], Ygc[i], Zgc[i]);
sintheta[i] = angles[1];
costheta[i] = angles[2];
sinphi[i] = angles[3];
cosphi[i] = angles[4];
}
}

parameters {
real<lower=0> p_gamma;
real<lower=0> p_phi0;
real<upper=1> p_beta;
real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha;

vector[3] raw_vels[N];
}

transformed parameters {
vector[N] pmra;
vector[N] pmdec;
vector[N] vlos;

vector[3] vels_sph[N];

for (i in 1:N) {
real max_vel = sqrt(2. * p_phi0 * pow(r_true[i], -p_gamma));
vels_sph[i] = max_vel * raw_vels[i] / sqrt(1. + dot_self(raw_vels[i]));
}

for (i in 1:N) {
vector[3] pm = inv_transform_vels(vels_sph[i], ra[i], dec[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]);
vlos[i] = pm[1];
pmra[i] = pm[2];
pmdec[i] = pm[3];
}

}

model {

for (i in 1:N) {
real max_v2 = 2. * p_phi0 * pow(r_true[i], -p_gamma);
target += 1.5 * log(max_v2) - 2.5 * log1p(dot_self(raw_vels[i]));
}

for (i in 1:N) {
pmra_true[i] ~ normal(pmra[i], pmra_err[i]);
pmdec_true[i] ~ normal(pmdec[i], pmdec_err[i]);
vlos_true[i] ~ normal(vlos[i], vlos_err[i]);
}

p_gamma ~ normal(0.5, 0.06);
p_phi0 ~ uniform(1., 200.);
p_beta ~ uniform(-0.5, 1.);
p_alpha ~ shifted_gamma(2.99321604, 2.82409927, 3.);

for (i in 1:N) {
real y[3];
y[1] = sqrt(dot_self(vels_sph[i]));
y[2] = sqrt(dot_self(vels_sph[i,1:2]));
y[3] = r_true[i];
target += log(fabs(vels_sph[i, 3])) - log(y[1]) - log(y[2]);
y ~ df(p_phi0, p_gamma, p_beta, p_alpha);
}

}

``````

You need parameter constraints that match the models support. Missing constraints can cause divergencies. Since you have

``````model {
p_phi0 ~ uniform(1., 200.);
p_beta ~ uniform(-0.5, 1.);
p_alpha ~ shifted_gamma(2.99321604, 2.82409927, 3.);
}
``````

then you need

``````parameters {
real<lower=1, upper=200> p_phi0;
real<lower=-0.5, upper=1> p_beta;
real<lower=max(3., p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha;
}
``````

Not sure what’s up with that `p_alpha` lower bound but `shifted_gamma` requires `p_alpha > 3.`

Looks right to me. Note that the inverse of a rotation matrix is also its transpose so these could be

``````      vector[3] uvw = ptz_mat' * ptz;
vector[3] dat = B' * (uvw - solarmotion);
``````

which may be slightly faster.
The speed is not that important but I’m concerned that `rtp_mat` is not a rotation matrix:

``````      matrix[3, 3] rtp_mat = [
[sinphi, 0,  cosphi],
[     0, 1,       0],
[cosphi, 0, -sinphi]
];
``````

Can you double check? I think the cosines are usually on the diagonal and sines in the corners.

1 Like

Hmm, that seems a little strange to me. If I tell it to only sample between, for example, 1 and 200, why is the constraint also necessary?

Edit: The priors are defined over R, but just have probability 0 outside their range of support, so the sampling statement isn’t telling it to sample between 1 and 200. So that’s what the constraints are there to do. ?

Yeah the matrix also threw me off a bit. In the paper I am trying to replicate the author has

``````        matrix[3, 3] rtp_mat = [
[sinphi, 0, cosphi],
[0, 1, 0],
[cosphi, 0, -sinphi]
];
``````

but her code on Github has the matrix with the sines on the diagonal, and I figured that I’d follow the code. I’ll run it both ways anyways just to check.

It can be rather confusing. The so-called sampling statements such as `phi0 ~ uniform(1, 200);` do not in fact sample anything. The `model` block only computes the probability of a sample; the samples are generated by the HMC algorithm, based on the observed probabilities of the previous samples and the bounds in the `parameters` block.
A “divergent transition” occurs when the HMC algorithm fails to follow the contours of the probability density and generates a really bad proposal for the next sample.
If you don’t declare a lower bound for `phi0` and the algorithm finds significant posterior probability of `phi0` being somewhat near 1 then it will likely also try a value that is slightly below 1 (where probability drops zero).

So I added the constraints and the model still isn’t very good. With defaults most of the iterations exceed the maximum tree depth, and even increasing this to 15, there are a lot (75%~) of iterations exceeding tree depth. This is also a lot slower (>2 hours). There are also some divergences.

For the first model without uncertainties, adding the constraints made it work even without the `phi_lower` constraint. It seemed to go a little faster with the `phi_lower`, but either way it only took a few seconds.

The model without the uncertainties has four parameter, the model with uncertainties has 3N+ parameters so it’s going to be slower but not by that much and the treedepth problems indicate the posterior geometry is complicated. There might still be some bug in the jacobians?

I’m curious, what software did the original paper use? Can you link to the GitHub repo?

They used R. It’s honestly pretty hard to follow (all over the place). The code is in the RScripts directory. The paper is also a little confusing because there are bits that are taken from previous papers so you kind of have to jump back and forth.

• The transformation with the matrices is in `functions_conversions.r`.
• The distribution function is in `functions_deason.r`. It’s the `logDF.Deason` function, which actually isn’t the log of the DF they give in their papers (one of the older papers) but it’s pretty close. I didn’t follow this in my code.
• The MCMC code is in `GME-hierarchical.r`.
• The priors are taken from the paper (Section 2, paragraph 6), and I couldn’t figure out what they used for `p_alpha` apart from it being a gamma distribution, so I just ran a curve fit to the picture (Fig. 2) they had in the paper and that’s why it’s a little weird.

Earlier I said that

but now I’m pretty sure that was wrong; `df` is the phase-space density so it does give the density of `vels_sph` without modification. You should remove the nonsensical jacobian from the code:

``````    target += log(fabs(vels_sph[i, 3])) - log(y[1]) - log(y[2]);
``````

Maybe that’s what’s messing up the model.

The other jacobian with `raw_vels` is still necessary.

Unfortunately removing that doesn’t seem to help. Most of iterations still exceed a maximum treedepth of 14, the model is pretty slow, and the effective sample sizes are maybe 20 even with 32000 draws (32 chains). The r-hat values are also not close to 1.

It also looks like increasing the tree depth decreases the number of saturated iterations, but increases the number of divergences.

I’m not sure if this is really the best way to do it but I went back to the model without uncertainties and just changed the bits down below. I think this is essentially what the R code does and I get results that are almost exactly the same as what is reported in the paper. I had to put in initial values though, and from comments on the forum it seems like that isn’t really the best way to go. I’ll try to spend some more time figuring out how to improve this.

Thank you so much @nhuurre for all the help and for being so patient with this newbie. :)

``````
parameters {
real<lower=0., upper=1.> p_gamma;
real<lower=1., upper=200.> p_phi0;
real<lower=-0.5, upper=1.> p_beta;
real<lower=max(to_vector([3., p_beta * (2. - p_gamma) + p_gamma / 2.])), upper=10.> p_alpha;

vector<lower=min(pmra_true) - max(pmra_err), upper=max(pmra_true) + max(pmra_err)>[N] pmra;
vector<lower=min(pmdec_true) - max(pmra_err), upper=max(pmdec_true) + max(pmdec_err)>[N] pmdec;
vector<lower=-300, upper=300>[N] vlos;
vector<lower=0, upper=max(r_true) + max(r_err)>[N] r;
}

transformed parameters {

real<lower=0> y[N, 3];

for (i in 1:N) {
vector[3] vels_sph = transform_vels(ra[i], dec[i], pmra[i], pmdec[i], vlos[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]);
y[i, 2] = sqrt(square(vels_sph[2]) + square(vels_sph[3]));
y[i, 1] = sqrt(square(vels_sph[1]) + square(y[i, 2]));
y[i, 3] = r[i];
}

}

model {

for (i in 1:N) {
pmra_true[i] ~ normal(pmra[i], pmra_err[i]);
pmdec_true[i] ~ normal(pmdec[i], pmdec_err[i]);
vlos_true[i] ~ normal(vlos[i], vlos_err[i]);
r_true[i] ~ normal(r[i], r_err[i]);
}

p_gamma ~ normal(0.5, 0.06);
p_phi0 ~ uniform(1., 200);
p_beta ~ uniform(-0.5, 1.);
p_alpha ~ shifted_gamma(2.99321604, 2.82409927, 3.);

for (i in 1:N) {
y[i] ~ df(p_phi0, p_gamma, p_beta, p_alpha);
}

}

``````

This takes ~45 minutes to run on 12 cores with a maximum treedepth of 12 and adapt_delta=0.99. I get <1% divergent iterations, ~33% iterations exceeding treedepth, and n_eff of a few thousand out of 12000 on all parameters.

Good to hear it’s (sort of) working.

1. My concern about `rtp_mat` was probably unfounded. Looking at it again I see it is an orthogonal matrix, just one whose determinant is negative.
2. The intended bounds on `pmra` are probably closer to
``````vector<lower=min(pmra_true - pmra_err), upper=max(pmra_true + pmra_err)>[N] pmra;
``````

which combines `pmra` value with its corresponding error before taking the minimum/maximum.
Also, CmdStan supports vectorized bound so if you’re working with CmdStanPy or CmdStanR you can also do

``````vector<lower= pmra_true - pmra_err, upper= pmra_true + pmra_err>[N] pmra;
``````

and it sets different bounds for each element of `pmra`.

1. Since you have reasonable parameter estimates you could try fixing some of the parameters to the “known” value and see if the other parameters can be inferred without divergence or tree depth problems. (Not sure if knowing which ones cause problems helps, though.)