Reparameterizing multiple Cauchy parameters

nerpa · December 31, 2020, 3:35am

If I have a few Cauchy scaling parameters that I want to reparameterize following stan users guide, should I declare a different real<lower=-pi()/2, upper=pi()/2> beta_unif; for every scaling parameter I have or can beta_unif be the same for all of them?

Option 1

parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif;
}
transformed parameters {
  real beta1;
  real beta2;

  beta1 = tan(beta_unif);
  beta2 = tan(beta_unif);
}

Option 2

parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif1;
  real<lower=-pi()/2, upper=pi()/2> beta_unif2;

}
transformed parameters {
  real beta1;
  real beta2;

  beta1 = tan(beta_unif1);
  beta2 = tan(beta_unif2);
}

Which one is the correct way? Thanks!

spinkney · December 31, 2020, 12:21pm

You can do either but they’re not quite the same. In option 1 you induce a dependence across beta1 and beta2. This may or may not be appropriate.

Let’s say I simulate 1000 observations from y \sim \mathcal{N}(3, 1). I put the cauchy prior from option 1. This is the stan model

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif;
}
transformed parameters {
  real beta1 = tan(beta_unif);
  real beta2 = tan(beta_unif);
}
model {
  y ~ normal(beta1, 1);
}

and the R code with corresponding output

library(rstan)
library(MASS)
options(mc.cores = parallel::detectCores())
set.seed(689934)

N <- 1000
y <- rnorm(N, 3, 1)

stan_rdump(c("N", "y"), file = "noniid_data.R")
input_data <- read_rdump("noniid_data.R")
fit <- stan(file='iid_prior.stan', data=input_data, seed=483892929)
print(fit)

You can see that although the prior was specified as a cauchy(0, 1), beta_unif takes on a mean value of 1.25 and both beta1 and beta2 are equal.

Inference for Stan model: noniid_prior.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta_unif    1.25    0.00 0.00    1.24    1.25    1.25    1.25    1.25  1477    1
beta1        3.00    0.00 0.03    2.94    2.98    3.00    3.02    3.06  1477    1
beta2        3.00    0.00 0.03    2.94    2.98    3.00    3.02    3.06  1477    1
lp__      -522.40    0.02 0.67 -524.39 -522.52 -522.14 -521.98 -521.93  1744    1

Samples were drawn using NUTS(diag_e) at Thu Dec 31 07:09:32 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

If instead, you keep separate priors, on beta1 and 2 by declaring a size 2 array of reals on beta_unif,

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real<lower=-pi()/2, upper=pi()/2> beta_unif[2];
}
transformed parameters {
  real beta1 = tan(beta_unif[1]);
  real beta2 = tan(beta_unif[2]);
}
model {
  y ~ normal(beta1, 1);
}

You can see that beta1 and 2 are now independent parameters where beta_unif[1] is the same as in the previous model since it induces a prior on a parameter tied to the data y. However, beta_unif[2] - since it isn’t tied to data - takes on a mean value around 0 corresponding to the cauchy(0, 1) distribution.

Inference for Stan model: iid_prior.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta_unif[1]    1.25    0.00  0.00    1.24    1.25    1.25    1.25    1.25  3355    1
beta_unif[2]    0.03    0.02  0.91   -1.49   -0.75    0.06    0.80    1.50  3393    1
beta1           3.00    0.00  0.03    2.94    2.98    3.00    3.02    3.06  3353    1
beta2           1.63    0.85 44.71  -12.70   -0.93    0.06    1.04   15.16  2749    1
lp__         -523.29    0.03  1.10 -526.27 -523.72 -522.97 -522.50 -522.20  1396    1

Samples were drawn using NUTS(diag_e) at Thu Dec 31 07:11:45 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

nerpa · December 31, 2020, 4:47pm

I see. Thanks so much! I have a follow up question – if I want to reparametrize half-Cauchy, namely that real<lower=0> beta, what should be the declaration for beta_unif? Should it be real<lower=0, upper=pi()/2> beta_unif ?

Thanks!

mike-lawrence · December 31, 2020, 5:17pm

You got it.

nerpa · December 31, 2020, 5:21pm

Thanks! I actually hoped I am doing something wrong, because my model still performs horribly!

mike-lawrence · December 31, 2020, 6:49pm

Presumably it’s a more complicated model than the simple one you posted above?

nerpa · December 31, 2020, 7:05pm

yes it is. my predictive checks look good, but my chain diagnostic is terrible. I have a few more things to try and if they won’t work, I will open a new thread to seek for help. Thanks!

mike-lawrence · December 31, 2020, 7:54pm

My understanding is that the heavy-tails of the Cauchy can be expected to cause sampling problems even when it’s been reparameterized as you’ve done. Do you get the sample issues when a simple normal distribution is used?

nerpa · December 31, 2020, 9:05pm

You mean even if I use very narrow Cauchy, for example with sigma of 0.5?
I didn’t think of using a normal distribution for a prior on my scaling parameters. Is it acceptable? I saw that it is recommended to use half-Cauchy.

mike-lawrence · December 31, 2020, 10:45pm

You should think of the Cauchy as really a 3 parameter distribution with a modifiable location and scale, but a tail-heaviness parameter fixed at the arguably extreme value of 1. Indeed, this is what the student-t distribution is (i.e. cauchy(0,1)==student_t(1,0,1)). And now that you’re thinking of the student-t, consider that as the tail-heaviness parameter (the df parameter in this case) increases it becomes the normal distribution (i.e. student_t(30,0,1) is approximately equal to normal(0,1)). This highlights that the only thing differentiating use of the cauchy vs a normal is how heavy-tailed the prior is. If you don’t want to go all the way to a normal, you could try a student-t with a lower value for the df parameter. While the early examples (and consequent legacy code) opted for Cauchy priors for things like variance components, I believe folks have realized that, consistent with samplers encountering issues with them, when you do prior predictive checks the Cauchy ends up yielding rather unreasonable model predictions. More recent default recommendations (ex. in brms I believe) opt for the student-t and I personally go for a normal and simply ensure the scale parameter is set such that prior predictive checks yield reasonable-looking data.

nerpa · December 31, 2020, 10:58pm

I will definitely try this, thanks so much!! My problem is that my predictive checks look great, but my model doesn’t converge al all (huge R-hat, low E-BFMI, 90% of hitting max tree depth). And happy new year!

betanalpha · January 5, 2021, 4:16pm

For some discussion of the dangers of Cauchy prior densities see https://betanalpha.github.io/assets/case_studies/weakly_informative_shapes.html.

A prior is acceptable only if it conforms to your domain expertise, which means that no one can accept a prior but yourself. Cauchy and half-Cauchy densities may be okay in some applications and problematic in others and so ideally you would verify each time using prior checks – see for example https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#11_domain_expertise_consistency.

That said I rarely have found Cauchy densities to be useful prior models as they allow far more probability at extreme model configurations than most people intuit. At the same time I personally have yet to find an application where normal prior densities, with carefully chosen scales, have problem problematic. But again this isn’t a rule and ideally one would check the consequences of a prior model in every application.

Topic		Replies	Views
Correct way to reparametrize a model with Cauchy prior Modeling techniques , specification , example-models , jacobian-adjustment	5	983	August 19, 2020
Reparamterization in Stan Modeling specification	3	420	April 29, 2021
Reparameterize parameters Modeling	4	1262	September 20, 2017
Reparameterization of student_t(3, 0, 1) Modeling	1	555	October 14, 2017
Beginner request for reparameterization help Modeling	12	1131	January 3, 2018

Reparameterizing multiple Cauchy parameters

Related topics