Sampling from redshift

I’m building a hierarchical population model for cosmologically distributed objects (in redshift) where I do not know all of the distances. Thus, I need to sample from some redshift distribution for the objects with unknown distances.

The posterior looks like this:

propto_Pr(_psi_v.pdf (1.4 MB)

The term I need to sample is the dV/dOmega term.

It is obviously unnormalized because it gives a rate density. My naive approach is to design some functions that build this density and then sample from it’s log:

  vector obj_rate(vector z, real rho, real zp, real p1, real p2){
    /*
      The rate in units on N/yr/Gpc^3

     */

    vector[num_elements(z)] val;

    // pow is not vectorized so
    // we need to do a loop

    for (i in 1:num_elements(z)){

      val[i] =  (1. + pow(z[i] / zp, p2));

    }

    // elementwise division

    return rho * (1. + p1 * z) ./ val;
  }

  vector differential_comoving_volume(vector z){
    /*
      the differential comoving volume in units
      of Gpc^3
     */

    vector[num_elements(z)] td;
    real dh;

    dh = 4326.009494949495;
    td = (comoving_transverse_distance(z)/3.086E24);

    return (dh* td .* td ./ a_factor(z)) * 1E-9;  // Gpc^3

  }


  vector C(vector z, real rho, real zp, real p1, real p2){
    /*
      The cosmological number
     */

    return obj_rate(z, rho, zp, p1, p2) .* differential_comoving_volume(z) ./ (1+z);

  }

  real redshift_distribution_lpdf( vector z, real rho, real zp, real p1, real p2){

    vector[num_elements(z)] prob;
    real lprob;

    prob = C(z,rho,zp,p1,p2);
    lprob = sum(log(prob));

    return lprob;

  }

and then in my model, I sample with this statement:

parameters {

  real L0_mu;
  real<lower=log10(2.)> G0_mu;
  real E0_mu;
  real T0_mu;

  real<lower=0> L0_sigma;
  real<lower=0> G0_sigma;
  real<lower=0> E0_sigma;
  real<lower=0> T0_sigma;


  vector<lower=0, upper=10>[N_unknown] z_unknown;

 z_unknown ~ redshift_distribution(rho,zp,p1,p2);

Is it ok to sample from a distribution in this way? I’m worried about constants out from of the posterior, etc. Is my “log prob.” that I define for the redshift distribution really a valid way to think about the probability?

Cheers!

Yes, that’s a valid way to implement a density if C is the function that calculates its components. It’s OK if it’s unnormalized, as long as the normalizing constants that are missing don’t depend on parameters (they can depend on data).

I don’t know anything about physics, so no idea about that part. Maybe @betanalpha can help.

As @Bob_Carpenter notes the big fear here is that the redshift distribution will require a normalization constant that depends on zp, p1, and p2. If those are unknown parameters then that normalization constant cannot be ignored.

It may help to rewrite the redshift distribution to be a little less astro-y and a little more probability-y. In other words, can you decompose this equation into something more generative like pi(z | phi ) * pi(phi)? That will then help identify the actual densities and whether they need to be normalized.

There are some other improvements (compute log densities from the beginning and then use log_sum_exp to add the probabilities together at the end, no hard cutoffs on redshift, etc) but those are much less important initially.

Thanks for the input!

There is still some wiggle room in the definition of the comoving object rate, but I have the keep the Jacobian to account for the cosmological expansion.

We have played with the idea that the object rate is approximately constant at low redshift, and thus could just be a uniform density, but then hitting it with dV/dz means that it is normalized.

One thing that I was curious about: say the object rate ,n(z | rho ) where rho is just the over all normalization of the rate for which I am fitting as well is uniform out to some redshift. This pops up as in the posterior as a multiplicative term. So I just sample from a uniform. But the (dV/dZ)(z) is also multiplied the whole posterior but evaluated at the sampled redshifts. So, if sample from the uniform distribution, and multiply (add the log) of the dV/dZ for each sample onto the posterior as well, will the exploration of the posterior work? I.e. does the penalization factor from adding on the dV/dz account for the right values of redshift or do I need to sample from the whole distribution directly?

Sorry if this is a bit convoluted!