Correct way to reparametrize a model with Cauchy prior

Hi

I wonder, which is the correct way to re-parametrize a posterior with Cauchy prior (and Gaussian likelihood) in order to help Stan to achieve better mixing of the chains. Now I have a model with the following code (for simplicity, we can assume some\_function(w)=Mw, where M is a constant matrix):

data {
  int N;
  vector[N] y;
}
parameters {
  vector[N] w;
}
transformed parameters {
  vector[N] f;
  f = some_function(w);
}

model {
  for (n in 1:N) {
      w[n] ~ cauchy(0, 1);
  }

  for (n in 1:N) {
      target += normal_lpdf(f[n]-y[n] | 0, 0.5);
   }

}

The question is, how to accomplish the re-parametrization in a correct manner? Stan documentation suggests to use uniform distribution and apply tan-transformation for each of the Cauchy variables, if one seeks to generate Cauchy random variables efficiently. But do I need to add logarithm-abs of the Jacobians of tan-transformations to the variable target or not in my case, because initially I have set independent Cauchy distributions for the components of the vector w? I think I have to add them, but which posterior does the following modified model then refer to? In any case, it is a proper distribution:

data {
  int N;
  vector[N] y;
}
parameters {
  vector<lower=-pi()/2, upper=pi()/2>[N] u;
}
transformed parameters {
  vector[N] w;
  vector[N] f;
  for (n in 1:N) {
      w[n] =  tan(u[n]);
  }
  f = some_function(w);
}

model {
  for (n in 1:N) {
      u[n] ~  uniform(-pi()/2, pi()/2);
  }

  for (n in 1:N) {
      target += normal_lpdf(f[n]-y[n] | 0, 0.5);
   }

}

You don’t need a Jacobian if you define your model generatively. See for example Section 5 of https://betanalpha.github.io/assets/case_studies/fitting_the_cauchy.html.

1 Like

Thank you for the reply. What does it mean to define the model generatively? Is there some other way to define models also?
This may sound a stupid question, but I like to see these things directly expressed as posteriors.

In the first code of mine, the posterior is
p(w|y) \propto \text{normalpdf}(\text{some_function}(w)-y | 0,1) \text{cauchypdf}(w,0,1). If I re-parametrize the model and I use the second code of my first post, I will have
p(u|y) \propto \text{normalpdf}(\text{some_function}(\text{tan}(u)))-y | 0,1) \text{uniformpdf}(u, -\pi/2,\pi/2), right? Maybe I am overthinking now this issue, but because the Jacobian of the transform and the Cauchy prior are not taken into account in the latter distribution, samples from it are not the same as in the first posterior (the curvature difference between Cauchy and Uniform distributions is not considered at all). For instance, if we compare the sampled w-variables from the first distribution and the \text{tan(}u)-variables sampled from the second distribution, they should have the same distribution, but do they?

Maybe the confusion is arising from what is generative and what are parameters? Anyway, you may already know this, but no need for a Jacobian on the cauchy since it was declared in the parameters block. I think the generative part is f - y. It’s a transformation of the data and this will only effect the outcome up to a constant (and Stan can drop those constants). Since f doesn’t require a prior, there’s no adjustment needed with the Jacobian. Also the adjustment would be unnecessary if `f is an affine transformation.

data {
  int N;
  vector[N] y;
}
parameters {
  vector<lower=-pi()/2, upper=pi()/2>[N] u;
}
transformed parameters {
  vector[N] w;
  vector[N] f;

// You may be able to replace with
// w = tan(u); 
// I forget if the trig functions are vectorized
  for (n in 1:N) {
      w[n] =  tan(u[n]);
  }
  f = some_function(w);
}

model {
// all parameters get uniform priors by default
// and respect the constraints put on them
//  for (n in 1:N) {
//      u[n] ~  uniform(-pi()/2, pi()/2);
 // }

// this works without the loop since
// it's a built in _lpdf with analytic derivatives
  target += normal_lpdf(f - y | 0, 0.5);

// but you can avoid an intermediate loop within Stan
// and drop unnecessary constants by declaring
 // { 
  // working in local scope by the curly brackets
  // we can declare variables that are not saved
 //  vector[N] f_minus_y = f - y;
 //  f_minus_y ~ normal(0, 0.5);
 // }
}

Maybe the confusion is arising from what is generative and what are parameters?

Yes, this might be the case. I read this blog post: https://www.rpubs.com/kaz_yos/stan_jacobian and there were sentences That is, the log prior we have to include in the model {} block is the log prior for the parameters as defined in the parameters {} block, NOT the log prior for the transformed parameters per se. This process can be facilitated by the inclusion of the log prior for the transformed parameters, but that alone is typically NOT enough. This is why and where the Jacobian adjustment comes in.

In other words, I do not need to include Jacobian of the transform, because we already know that the tan-transformed uniform random variables follow Cauchy distribution? In the blog post, the Jacobian is needed, because nothing special is known before the transformation about how the distributions of the non-transformed and transformed parameters correspond to each other, and yet, we want the transformed variable to follow certain distribution?

In general let me recommend studying up on probability density functions and their transformations in general, see for example https://betanalpha.github.io/assets/case_studies/probability_theory.html#42_probability_density_functions.

Jacobians are needed whenever you want to model a variable indirectly, through a transformation, instead of the directly.

For example the following Stan program is well-defined because we’re putting a density directly on the parameter x.

parameters {
  real x;
}
model {
  target += normal_lpdf(x | 0, 1);
}

On the other hand the following Stan program is incorrect because we’re trying to put a density on transformation of the parameter,

parameters {
  real x;
}
transformed parameters {
  real log_x = log(x);
}
model {
  target += normal_lpdf(log_x | 0, 1); // Transformed variable to the left of the bar
}

If we want to do that then we need to add a log Jacobian determinant to account for how the density transforms from log_x back to x,

parameters {
  real x;
}
transformed parameters {
  real log_x = log(x);
}
model {
  target += normal_lpdf(log_x | 0, 1);
  target += log_x;
}

Alternatively we can just model log_x directly by making it the parameter,

parameters {
  real log_x;
}
transformed parameters {
  real x = exp(log_x);
}
model {
  target += normal_lpdf(log_x | 0, 1); // Now log_x is no longer a transformed variable but a parameter variable
}

As I discuss in the link, which I encourage you to read carefully, there are multiple ways to construct variables with Cauchy densities generatively. For example we can transform a uniform variable – here x_tilde will follow a uniform density but x will follow a Cauchy density.

parameters {
  vector<lower=0, upper=1> x_tilde;
}

transformed parameters {
  real x = tan(pi() * (x_tilde - 0.5));
}

model {
  // Implicit uniform prior on the x_tilde
}

There’s no need to add a Jacobian because we’re not modeling x.

Perhaps another way of thinking about it is going backwards. Start with

parameters {
  real x;
}
model {
   x ~ cauchy(0, 1);
}

and ask what happens when we apply the transformation

\tilde{x} = \frac{ \arctan(x) }{ \pi } + \frac{1}{2}.

We’ll need to compute the log Jacobian determinant which, it turns out, exactly cancels the Cauchy density such that the transformed density for x_tilde is exactly uniform within its bounded domain.

1 Like