Constraints on a transformation of multiple parameters

Hi,

I’m trying to fit a model but I can’t figure out how to implement a constraint on a variable which is some function of some parameters. Right at the top of the function df_lpdf I have this E variable, and I want it to be positive. I have tried defining it inside the transformed parameters block instead but the constraints there are only for checks, and not actual constraints. Do I need to reparameterize this, and if so, how can I do so? Right now the model sometimes runs and sometimes doesn’t because of bad starting values. If it runs then I’m getting n_eff < 100 from 1000 draws.

Also, is the way I currently have it set up the proper way to pass multiple inputs to a custom distribution function (the real[] y part)? It seems a little clunky but I couldn’t find any examples in the documentation.

Sorry if this is an obvious problem, this is my first time using Stan so I am unfamiliar with it.

Thanks! :)

functions {

    real df_lpdf(real[] y, real phi0, real g, real b, real a) {

        // E should be > 0
        real E = -square(y[1]) / 2. + phi0 / pow(y[3], g); 
        real L = y[3] * y[2];

        real num_t1 = -2 * b * log(L);
        real num_t2 = (b * (g - 2.) / g + a / g - 3. / 2.) * log(E);
        real num_t3 = lgamma(a / g - 2. * b / g + 1.);

        real denom_t1 = 0.5 * log(8. * pow(pi(), 3.) * pow(2., -2. * b));
        real denom_t2 = (-2. * b / g + a / g) * log(phi0);
        real denom_t3 = lgamma(b * (g - 2.) / g + a / g - 1. / 2.);

        real numerator = num_t1 + num_t2 + num_t3;
        real denominator = denom_t1 + denom_t2 + denom_t3;

        // this ~should~ never happen but it does.
        if (E < 0.) {
            return negative_infinity();
        }

        return numerator - denominator;
    }

    vector get_angles(real x, real y, real z) {
        real projR = sqrt(square(x) + square(y));
        real R = sqrt(square(x) + square(y) + square(z));

        real sintheta = y / projR;
        real costheta = x / projR;

        real sinphi = projR / R;
        real cosphi = z / R;

        return to_vector([sintheta, costheta, sinphi, cosphi]);
    }

    vector transform_vels(real ra, real dec, real pmra, real pmdec, real vlos, real plx, real sintheta, real costheta, real sinphi, real cosphi) {
        matrix[3,3] T = [
            [-0.06699, -0.87276, -0.48354],
            [ 0.49273, -0.45035,  0.74458],
            [-0.86760, -0.18837,  0.46020]
        ];
        real deg2rad = pi() / 180.;
        real sinra = sin(ra * deg2rad);
        real cosra = cos(ra * deg2rad);
        real sindec = sin(dec * deg2rad);
        real cosdec = cos(dec * deg2rad);
        matrix[3,3] A = [
            [cosra * cosdec, -sinra, -cosra * sindec],
            [sinra * cosdec, cosra , -sinra * sindec],
            [sindec,              0,          cosdec]
        ];
        matrix[3, 3] B = T * A;
        real k = 4.74057;
        vector[3] solarmotion = to_vector([11.1, 232.24, 7.25]);

        vector[3] dat = to_vector([vlos, k * pmra / plx, k * pmdec / plx]);
        vector[3] uvw = B * dat + solarmotion;

        matrix[3, 3] ptz_mat = [
            [costheta, sintheta, 0],
            [-sintheta, costheta, 0],
            [0, 0, 1]
        ];
        vector[3] ptz = ptz_mat * uvw;

        matrix[3, 3] rtp_mat = [
            [sinphi, 0, cosphi],
            [0, 1, 0],
            [cosphi, 0, -sinphi]
        ];
        vector[3] rtp = rtp_mat * ptz;

        return rtp / 100; 
    }

}

data {

    int<lower=1> N;

    vector[N] ra;
    vector[N] dec;
    vector[N] plx;

    vector[N] Xgc;
    vector[N] Ygc;
    vector[N] Zgc;

    // measurements
    vector[N] pmra_true;
    vector[N] pmdec_true;
    vector[N] vlos_true;
    vector[N] r_true;

}

transformed data {
    vector[N] sintheta;
    vector[N] costheta;
    vector[N] sinphi;
    vector[N] cosphi;

    real<lower=0> y[N, 3];

    for (i in 1:N) {
        vector[4] angles = get_angles(Xgc[i], Ygc[i], Zgc[i]);
        sintheta[i] = angles[1];
        costheta[i] = angles[2];
        sinphi[i] = angles[3];
        cosphi[i] = angles[4];
    }

    {
        for (i in 1:N) {
            vector[3] vels_sph = transform_vels(ra[i], dec[i], pmra_true[i], pmdec_true[i], vlos_true[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]);
            y[i, 2] = sqrt(square(vels_sph[2]) + square(vels_sph[3]));
            y[i, 1] = sqrt(square(vels_sph[1]) + square(y[i, 2]));
            y[i, 3] = r_true[i];
        }
    }
}

parameters {
    real<lower=0> p_phi0;
    real<lower=0> p_gamma; 
    real<upper=1> p_beta;
    real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha; 
}

model {

    // priors ----------------

    p_phi0 ~ uniform(1., 200);
    p_gamma ~ normal(0.5, 0.06);
    p_beta ~ uniform(-0.5, 1.);
    p_alpha ~ gamma(2.99321604, 2.82409927);

     // likelihood --------------------- 

    for (i in 1:N) {
        y[i] ~ df(p_phi0, p_gamma, p_beta, p_alpha);
    }

}

Yes, that’s the correct way to pass multiple inputs.

We don’t recommend uniform priors. What’s the motivation for 1 and 200 in p_phi0~uniform(1,200)? Would p_phi~normal(0,200)T[0,] work?

Anyway, looks to me that the constraint you need is

functions {
  ...
  real phi_lower(real[,] y, real g) {
    real p[size(y)];
    for (i in 1:size(y)) {
      p[i] = 0.5 * square(y[i,1]) * pow(y[i,3], g);
    }
    return max(p);
  }
}
...
parameters {
    real<lower=0> p_gamma; 
    real<lower=phi_lower(y, p_gamma)> p_phi0;
    real<lower=0.5,upper=1> p_beta;
    real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha; 
}

Thanks for the help! That constraint makes a lot of sense, and it works.

I’m trying to reproduce the results from a paper and uniform(1, 200) is what they use. I’ll try out your suggestion though and see what difference that makes.

@nhuurre I’m trying to add another layer to the model, and y has been moved to the transformed_parameters block. I think this means that p_phi0 has to be moved there as well, because it doesn’t know about y when it’s in the parameters block. Now my problem is that the transformed_parameters block doesn’t enforce the constraints. How would I go about solving this? Thanks!

So, if I’m reading this right you transform the observed proper motion to galactic coordinates and y[1] is the speed in the galactic plane, y[2] is the the total speed and y[3] is some distance (from the galactic center?)
You didn’t explain what change requires y to be a transformed parameter but my first guess is that you’re modeling the uncertainty in proper motion by making pmra_true a parameter. If that’s the case then I think it would be easier to parametrize the velocity in galactic coordinates and calculate the proper motion as a transformed parameter. That way the constraints are more natural.
Secondly, what is the physical mechanism that enforces the constraint E > 0?
Is it an inviolable law with no exceptions, or a probabilistic one where exceptions are possible but very unlikely? If it’s the latter then consider omitting the constraint and modeling the mechanism directly.

The observed proper motions are transformed into spherical velocities in galactic coordinates (output of the transform_vels function) and then retransformed into y because the distribution function takes in y. y[1] is the total speed, y[2] is the tangential speed relative to the galactic center, and y[3] is the distance from the galactic center.

Yes, you’re totally right—I’m trying to incorporate uncertainties. I’ve included some code below, that probably explains things better than I can. The main change is that I have some pmra_err data coming in, and instead of transforming the pmra_true parameters to get the velocities, I use some new pmra parameter which is normally distributed around pmra_true. Then these velocities are put into y in the same way as before.

I don’t have a great understanding of this either, but E is the relative energy, and for E<=0 (sorry this should be <= and not <) the distribution function should be 0. From what I can tell, there is an assumption that p_phi0 is chosen such that the DF is 0 when E<=0, so the range of support of the DF is only for E>0. This is what I was trying to do with the negative_infinity() thing, which I guess is supposed to be 0. I just tried it again with 0, and this also does not work (“Stan can’t start sampling from this initial value”).

functions {
    
    real df_lpdf(real[] y, real phi0, real g, real b, real a) {

        // y[1] = v, y[2] = v_t, y[3] = r;
        real E = -square(y[1]) / 2. + phi0 / pow(y[3], g);
        real L = y[3] * y[2];

        // terms in numerator
        real num_t1 = -2 * b * log(L);
        real num_t2 = (b * (g - 2.) / g + a / g - 3. / 2.) * log(E);
        real num_t3 = lgamma(a / g - 2. * b / g + 1.);
        
        // terms in denominator
        real denom_t1 = 0.5 * log(8. * pow(pi(), 3.) * pow(2., -2. * b));
        real denom_t2 = (-2. * b / g + a / g) * log(phi0);
        real denom_t3 = lgamma(b * (g - 2.) / g + a / g - 1. / 2.);
        
        real numerator = num_t1 + num_t2 + num_t3;
        real denominator = denom_t1 + denom_t2 + denom_t3;

        // this ~should~ never happen
        if (E < 0.) {
            return 0.;
        }         

        return numerator - denominator;
    }

    real phi_lower(real[,] y, real g) {
      real p[size(y)];
      for (i in 1:size(y)) {
        p[i] = 0.5 * square(y[i,1]) * pow(y[i,3], g);
      }
      return max(p);
    }

    real shifted_gamma_lpdf(real y, real a, real b, real shift) {
        real shifted_y = y - shift;
        return a * log(b) - lgamma(a) + (a - 1) * log(shifted_y) - b * shifted_y;
    }

    vector get_angles(real x, real y, real z) {
        real projR = sqrt(square(x) + square(y));
        real R = sqrt(square(x) + square(y) + square(z));

        real sintheta = y / projR;
        real costheta = x / projR;

        real sinphi = projR / R;
        real cosphi = z / R;

        return to_vector([sintheta, costheta, sinphi, cosphi]);
    }

    vector transform_vels(real ra, real dec, real pmra, real pmdec, real vlos, real plx, real sintheta, real costheta, real sinphi, real cosphi) {
        matrix[3,3] T = [
            [-0.06699, -0.87276, -0.48354],
            [ 0.49273, -0.45035,  0.74458],
            [-0.86760, -0.18837,  0.46020]
        ];
        real deg2rad = pi() / 180.;
        real sinra = sin(ra * deg2rad);
        real cosra = cos(ra * deg2rad);
        real sindec = sin(dec * deg2rad);
        real cosdec = cos(dec * deg2rad);
        matrix[3,3] A = [
            [cosra * cosdec, -sinra, -cosra * sindec],
            [sinra * cosdec,  cosra, -sinra * sindec],
            [        sindec,      0,          cosdec]
        ];
        matrix[3, 3] B = T * A;
        real k = 4.74057;
        vector[3] solarmotion = to_vector([11.1, 232.24, 7.25]); // including rotation

        vector[3] dat = to_vector([vlos, k * pmra / plx, k * pmdec / plx]);
        vector[3] uvw = B * dat + solarmotion;

        matrix[3, 3] ptz_mat = [
            [ costheta, sintheta, 0],
            [-sintheta, costheta, 0],
            [        0,        0, 1]
        ];
        vector[3] ptz = ptz_mat * uvw;

        matrix[3, 3] rtp_mat = [ 
            [sinphi, 0,  cosphi], 
            [     0, 1,       0],
            [cosphi, 0, -sinphi]
        ];
        vector[3] rtp = rtp_mat * ptz;

        return rtp / 100;
    }

}

data {

    int<lower=1> N; // total length of data
    
    vector[N] ra;
    vector[N] dec;
    vector[N] plx;

    vector[N] Xgc;
    vector[N] Ygc;
    vector[N] Zgc;

    // measurements
    vector[N] pmra_true;
    vector[N] pmdec_true;
    vector[N] vlos_true;
    vector[N] r_true;
    
    // measurement uncertainties
    vector[N] pmra_err; 
    vector[N] pmdec_err;
    vector[N] vlos_err;
    vector[N] r_err;

}

transformed data {
    vector[N] sintheta;
    vector[N] costheta;
    vector[N] sinphi;
    vector[N] cosphi;

    for (i in 1:N) {
        vector[4] angles = get_angles(Xgc[i], Ygc[i], Zgc[i]);
        sintheta[i] = angles[1];
        costheta[i] = angles[2];
        sinphi[i] = angles[3];
        cosphi[i] = angles[4];
    }
}

parameters {
    real<lower=0> p_gamma; 
    real<upper=1> p_beta
    real<lower=(p_beta * (2. - p_gamma) + p_gamma / 2.)> p_alpha; 

    vector<lower=min(pmra_true) - 5 * max(pmra_err), upper=max(pmra_true) + 5 * max(pmra_err)>[N] pmra;
    vector<lower=min(pmdec_true) - 5 * max(pmdec_err), upper=max(pmdec_true) + 5 * max(pmdec_err)>[N] pmdec;
    vector<lower=min(vlos_true) - 5 * max(vlos_err), upper=max(vlos_true) + 5 * max(vlos_err)>[N] vlos;
    vector<lower=min(r_true) - 5 * max(r_err), upper=max(r_true) + 5 * max(r_err)>[N] r;
} 

transformed parameters {

    real y[N, 3]; // wrap up input data as (v, v_t, r)

    // the constraint doesn't work here. 
    real<lower=phi_lower(y, p_gamma)> p_phi0; // scale factor for gravitational potential 

    {
        for (i in 1:N) {
            vector[3] vels_sph = transform_vels(ra[i], dec[i], pmra[i], pmdec[i], vlos[i], plx[i], sintheta[i], costheta[i], sinphi[i], cosphi[i]); 
            y[i, 2] = sqrt(square(vels_sph[2]) + square(vels_sph[3]));
            y[i, 1] = sqrt(square(vels_sph[1]) + square(y[i, 2]));
            y[i, 3] = r[i];

        }
    }

   
}


model {

    for (i in 1:N) {
        pmra[i] ~ normal(pmra_true[i], pmra_err[i]); 
        pmdec[i] ~ normal(pmdec_true[i], pmdec_err[i]); 
        vlos[i] ~ normal(vlos_true[i], vlos_err[i]); 
        r[i] ~ normal(r_true[i], r_err[i]); 
    }

    p_gamma ~ normal(0.5, 0.06);
    p_phi0 ~ normal(0., 150.)T[0, 200];
    p_beta ~ uniform(-0.5, 1.);
    p_alpha ~ shifted_gamma(2.99321604, 2.82409927, 3.);

    for (i in 1:N) {
        y[i] ~ df(p_phi0, p_gamma, p_beta, p_alpha);
    }
    
}

generated quantities {
}

Ah, makes sense. I mixed up y[1] and y[2] and didn’t follow details of the transform.

The correct representation of zero probability is negative_infinity() as lpdf is on the log-scale.

That’s what I thought.
But note that the “data generating process” goes in the other direction. The observed star has some physical velocity and when viewed from Earth the projection of that vector appears as proper motion. The proper motion is then measured with some error so that the recorded value is random number drawn from distribution normal(pmra, pmra_err).

Stan allows you to model the observation process directly.

data {
  // measurements
  vector[N] pmra_true;
  vector[N] pmdec_true;
  vector[N] vlos_true;
  // uncertainties
  vector[N] pmra_err;
  vector[N] pmdec_err;
  vector[N] vlos_err;
  ...
}
parameters {
  vector[3] vels_sph[N];
}
transformed parameters {
  vector[N] pmra;
  vector[N] pmdec;
  vector[N] vlos;
  for (i in 1:N) {
    // the coordinate transform, basically transform_vels() backwards
    vector[3] pm = inv_transform_vels(vels_sph, ...);
    pmra[i] = pm[1];
    pmdec[i] = pm[2];
    vlos[i] = pm[3];
  }
}
model {
  for (i in 1:N) {
    pmra_true[i] ~ normal(pmra[i], pmra_err[i]); 
    pmdec_true[i] ~ normal(pmdec[i], pmdec_err[i]); 
    vlos_true[i] ~ normal(vlos[i], vlos_err[i]); 
  }
}

The advantage of this formulation is that now the tricky constraint is simply an upper bound on the length of the vector vels_sph.
Such a constraint can be implemented by a custom transform, for example

parameters {
  vector[3] raw_vels[N];
}
transformed parameters {
  vector[3] vels_sph[N];
  for (i in 1:N) {
    real max_vel = sqrt(2*p_phi0*pow(r[i], -g));
    vels_sph[i] = max_vel * raw_vels[i]/sqrt(1+dot_self(raw_vels[i]));
  }
}
model {
  for (i in 1:N) {
    // implies uniform density for vels_sph
    real max_v2 = 2*p_phi0*pow(r[i], -g);
    target += 1.5*log(max_v2) - 2.5*log1p(dot_self(raw_vels[i]));
  }
}

And finally you have

y[1] = sqrt(dot_self(vels_sph[i]));
y[2] = sqrt(dot_self(vels_sph[i,1:2]));
y[3] = r[i];
target += log(fabs(vels_sph[i,3])) - log(y[1]) - log(y[2]);
y ~ df(p_phi0, p_gamma, p_beta, p_alpha);

The target+= statement is a Jacobian adjustment and it’s there because df() computes the density of y but we need to compute the density for the parameter vels_sph.

1 Like

But note that the “data generating process” goes in the other direction.

Ok, this makes sense.

Sorry, I took a while to try to go through your solution but I’m having a hard time following. Here are a bunch of questions:

  • We can’t put the constraint on p_phi0, so we instead put an equivalent constraint on the velocity (just rearrange the inequality). That’s where max_vel comes from. Why can we not use that as the upper bound for raw_vels instead of doing the whole raw_vels to vels_sph thing?

  • I don’t understand the raw_vels thing and where the following two lines in your second code block came from:

     vels_sph[i] = max_vel * raw_vels[i]/sqrt(1+dot_self(raw_vels[i]));
    
     target += 1.5*log(max_v2) - 2.5*log1p(dot_self(raw_vels[i]));
    

    I tried un-logging the second one and got pow(max_vel, 3) * 1. / pow(sqrt(1. + dot_self(raw_vels[i])), 5) which is not quite the first one cubed… ?

  • Why can’t pmra just be a free parameter instead of being defined by the inverse transform?

  • I don’t understand where the idea for the inverse transform came from. Is this because now that the constraint goes the other way, the transform also has to go the other way?

I know that you came up with this pretty quickly so it’s probably obvious to you, but if you could walk through your thought process in coming to this solution that would be really really helpful. Thanks.

Stan interpretes upper bound on a vector to mean upper bound on each component independently. So vector<lower=-1,upper=1>[3] would constrain the parameter to be inside a cube but for a length constraint it should be inside a ball. A custom constraint transform is needed.

The first line is some transform that restricts the length of the vector. The choice is somewhat arbitrary but the important thing is that any value of raw_vels gives a vels_sph whose length less than max_vel.

The second line is the log Jacobian determinant of the first line. The change of variables page I linked in the last post should explain the math behind it.
Now, Jacobians are tricky and I never get them right on the first try so it’s good idea to test it. Here’s what I did. The transform-only model is

parameters {
    vector[3] raw;
}
transformed parameters {
    vector[3] vel = raw / sqrt( 1 + dot_self(raw) );
}
model {
    target += -2.5 * log1p(dot_self(raw));
}

This model can be sampled from and if the transform is correct every vel vector must have length < 1. Also, because there is no prior or likelihood the samples must be distributed uniformly in the allowed region.

The upper plot shows samples in 3d coordinate system. The lower plot compares the number of samples within r of the center to the volume of r-radius ball. Because the number of samples is proportional to volume the distribution is uniform as expected.

If the Jacobian is wrong then either the distribution will be non-uniform or sampling will just fail.

It can be but it’s easier to express the constraints on vels_sph. pmra and vels_sph cannot both be parameters in the same model because that would imply independence; but they are coordinate transforms of each other.

Yes. In the original model transform_vels() takes (the known) proper motion and calculates the corresponding velocity in the spherical coordinates. In this model you hypothesize a value for the velocity and need calculate the predicted proper motion (which is then compared with the observed proper motion).

Feel free to ask more questions if anything is unclear.

1 Like