Reparameterizing multiple Cauchy parameters

If I have a few Cauchy scaling parameters that I want to reparameterize following stan users guide, should I declare a different real<lower=-pi()/2, upper=pi()/2> beta_unif; for every scaling parameter I have or can beta_unif be the same for all of them?

Option 1

parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif;
}
transformed parameters {
  real beta1;
  real beta2;

  beta1 = tan(beta_unif);
  beta2 = tan(beta_unif);
}

Option 2

parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif1;
  real<lower=-pi()/2, upper=pi()/2> beta_unif2;

}
transformed parameters {
  real beta1;
  real beta2;

  beta1 = tan(beta_unif1);
  beta2 = tan(beta_unif2);
}

Which one is the correct way? Thanks!

You can do either but they’re not quite the same. In option 1 you induce a dependence across beta1 and beta2. This may or may not be appropriate.

Let’s say I simulate 1000 observations from y \sim \mathcal{N}(3, 1). I put the cauchy prior from option 1. This is the stan model

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif;
}
transformed parameters {
  real beta1 = tan(beta_unif);
  real beta2 = tan(beta_unif);
}
model {
  y ~ normal(beta1, 1);
}

and the R code with corresponding output

library(rstan)
library(MASS)
options(mc.cores = parallel::detectCores())
set.seed(689934)

N <- 1000
y <- rnorm(N, 3, 1)

stan_rdump(c("N", "y"), file = "noniid_data.R")
input_data <- read_rdump("noniid_data.R")
fit <- stan(file='iid_prior.stan', data=input_data, seed=483892929)
print(fit)

You can see that although the prior was specified as a cauchy(0, 1), beta_unif takes on a mean value of 1.25 and both beta1 and beta2 are equal.

Inference for Stan model: noniid_prior.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta_unif    1.25    0.00 0.00    1.24    1.25    1.25    1.25    1.25  1477    1
beta1        3.00    0.00 0.03    2.94    2.98    3.00    3.02    3.06  1477    1
beta2        3.00    0.00 0.03    2.94    2.98    3.00    3.02    3.06  1477    1
lp__      -522.40    0.02 0.67 -524.39 -522.52 -522.14 -521.98 -521.93  1744    1

Samples were drawn using NUTS(diag_e) at Thu Dec 31 07:09:32 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

If instead, you keep separate priors, on beta1 and 2 by declaring a size 2 array of reals on beta_unif,

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif[2];
}
transformed parameters {
  real beta1 = tan(beta_unif[1]);
  real beta2 = tan(beta_unif[2]);
}
model {
  y ~ normal(beta1, 1);
}

You can see that beta1 and 2 are now independent parameters where beta_unif[1] is the same as in the previous model since it induces a prior on a parameter tied to the data y. However, beta_unif[2] - since it isn’t tied to data - takes on a mean value around 0 corresponding to the cauchy(0, 1) distribution.

Inference for Stan model: iid_prior.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta_unif[1]    1.25    0.00  0.00    1.24    1.25    1.25    1.25    1.25  3355    1
beta_unif[2]    0.03    0.02  0.91   -1.49   -0.75    0.06    0.80    1.50  3393    1
beta1           3.00    0.00  0.03    2.94    2.98    3.00    3.02    3.06  3353    1
beta2           1.63    0.85 44.71  -12.70   -0.93    0.06    1.04   15.16  2749    1
lp__         -523.29    0.03  1.10 -526.27 -523.72 -522.97 -522.50 -522.20  1396    1

Samples were drawn using NUTS(diag_e) at Thu Dec 31 07:11:45 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
2 Likes

I see. Thanks so much! I have a follow up question – if I want to reparametrize half-Cauchy, namely that real<lower=0> beta, what should be the declaration for beta_unif? Should it be real<lower=0, upper=pi()/2> beta_unif ?

Thanks!

You got it.

1 Like

Thanks! I actually hoped I am doing something wrong, because my model still performs horribly!

Presumably it’s a more complicated model than the simple one you posted above?

1 Like

yes it is. my predictive checks look good, but my chain diagnostic is terrible. I have a few more things to try and if they won’t work, I will open a new thread to seek for help. Thanks!

My understanding is that the heavy-tails of the Cauchy can be expected to cause sampling problems even when it’s been reparameterized as you’ve done. Do you get the sample issues when a simple normal distribution is used?

You mean even if I use very narrow Cauchy, for example with sigma of 0.5?
I didn’t think of using a normal distribution for a prior on my scaling parameters. Is it acceptable? I saw that it is recommended to use half-Cauchy.

You should think of the Cauchy as really a 3 parameter distribution with a modifiable location and scale, but a tail-heaviness parameter fixed at the arguably extreme value of 1. Indeed, this is what the student-t distribution is (i.e. cauchy(0,1)==student_t(1,0,1)). And now that you’re thinking of the student-t, consider that as the tail-heaviness parameter (the df parameter in this case) increases it becomes the normal distribution (i.e. student_t(30,0,1) is approximately equal to normal(0,1)). This highlights that the only thing differentiating use of the cauchy vs a normal is how heavy-tailed the prior is. If you don’t want to go all the way to a normal, you could try a student-t with a lower value for the df parameter. While the early examples (and consequent legacy code) opted for Cauchy priors for things like variance components, I believe folks have realized that, consistent with samplers encountering issues with them, when you do prior predictive checks the Cauchy ends up yielding rather unreasonable model predictions. More recent default recommendations (ex. in brms I believe) opt for the student-t and I personally go for a normal and simply ensure the scale parameter is set such that prior predictive checks yield reasonable-looking data.

2 Likes

I will definitely try this, thanks so much!! My problem is that my predictive checks look great, but my model doesn’t converge al all (huge R-hat, low E-BFMI, 90% of hitting max tree depth). And happy new year!

1 Like

For some discussion of the dangers of Cauchy prior densities see https://betanalpha.github.io/assets/case_studies/weakly_informative_shapes.html.

A prior is acceptable only if it conforms to your domain expertise, which means that no one can accept a prior but yourself. Cauchy and half-Cauchy densities may be okay in some applications and problematic in others and so ideally you would verify each time using prior checks – see for example https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#11_domain_expertise_consistency.

That said I rarely have found Cauchy densities to be useful prior models as they allow far more probability at extreme model configurations than most people intuit. At the same time I personally have yet to find an application where normal prior densities, with carefully chosen scales, have problem problematic. But again this isn’t a rule and ideally one would check the consequences of a prior model in every application.

1 Like