Non-centered parameterization with additional constraints

Hi,

I’m trying to sample from a hierarchical model and I’m having issues. I guess they can be alleviated if I transform the hierarchical priors to a non-centered parameterization. In essence, the model is a normal observation of the ellipticity of galaxies, which are interpreted as triaxial structures. The three axes are defined as A>B>C and everything is normalized to A=1, so that 0<C<B<1. I impose the ordering by sampling from a 3-simplex, computing the cumulative sum and taking only the first two elements, as shown in the `transformed parameters’ section below. I want to impose a hierarchical prior on B and C, which is just a normal prior whose mean depends on the luminosity of the galaxy. Fortunately, the transformation that imposes the ordering has constant Jacobian.

This code samples relatively well for some datasets, in those in which the variance of the priors is well above 0 but in other cases I’m having convergence issues. I would like to use a non-centered reparameterization but I’m not sure how to do it and if it is really feasible to have two transformations and impose a prior on the transformed parameters. Any suggestion how to do it in Stan?

The current code is the following. The calculation of the likelihood is mathematically complex but not problematic.


data {
    int<lower=0> N;
    vector[N] q_obs;    
    vector[N] sigmaq_obs;
    vector[N] L;
}

parameters {
   vector<lower=0>[2] sdCB;
   vector[2] alphaCB;
   vector[2] betaCB;
   vector<lower=0,upper=1>[N] mu;
   vector<lower=0,upper=pi()/2>[N] phi;
   simplex[3] CB_[N];
}

transformed parameters {
  vector[2] CB[N];
  for (i in 1:N) {        
        CB[i] = head(cumulative_sum(CB_[i]), 2);        
  }
}

model {
  real q[N];
  real A=1.0;
  real sin_theta;
  real f2;
  real g;
  real h;
  int loop;

  sdCB ~ normal(0.0, 1.0);
  alphaCB ~ normal(0.0, 2);
  betaCB ~ normal(0.0, 0.5);

  for (i in 1:N) {
    CB[i] ~ normal(alphaCB + betaCB * L[i], sdCB);
  }

  for (i in 1:N) {
    sin_theta = sqrt(1.0 - square(mu[i]));
    f2 = square( A*CB[i,1]*sin_theta*cos(phi[i]) ) + square( CB[i,2]*CB[i,1]*sin_theta*sin(phi[i]) ) 
            + square( A*CB[i,2]*mu[i] );
    g = A*A * (square(cos(phi[i])) + square(mu[i]) * square(sin(phi[i])) ) + 
            CB[i,2]*CB[i,2] * (square(sin(phi[i])) + square(mu[i]) * square(cos(phi[i]))) + 
            CB[i,1]*CB[i,1] * square(sin_theta);

    h = sqrt(  (g - 2 * sqrt(f2)) / (g + 2 * sqrt(f2))  );
    q[i] = (1 - h) / (1 + h);

    q_obs[i] ~ normal(q[i], sigmaq_obs[i]);
    }
} 

Andrés

1 Like

Doing non-centerings or hierarchical modeling explicitly on constrained variables is hard.

Probably the easiest way to get this to work is to do a non-centered hierarchical prior for some parameters just using some normal distributions, and get the constraint you need by transforming these variables.

Does that make sense?

Like just do:

\mu \sim N(0, 1); \\ \tau \sim N(0, 1); \\ \theta_i \sim N(\mu, \tau); \\ c = f(\theta)

Where f is some function to constrain things. It’ll change your interpretation of the hierarchical parameters, but this is the same thing you sorta deal with in glms and such.

Thanks for the suggestion. That might probably work but then I’m not putting the hierarchical priors on the variables that I need in the modeling. I’ll probably have to rethink the model again.

Hmm, well you’re putting the priors on a thing that transforms into these variables.

When we do logistic regression we’re dealing with something like 0 < p < 1, and we’re getting p by an inverse logit transform of some unconstrained transformation.

Now you’re doing 0 < C < B < 1, so it seems only a little more complex :D?

Anyway, something like:

vector[3] = softmax([mu_A, mu_B, 0.0]);
real A = vector[1] + vector[2] + vector[3]; // It's always 1.0
                                            // No need to actually compute
real B = vector[1] + vector[2];
real C = vector[1];

So here you get your constraint from the softmax, and then your parameters are transforms of that.

How you build your prior on top of that is another question, but there you can work in an unconstrained space and use parameterizations and such that you know should sample efficiently.