Hello all. I am very familiar with the non-centered reparameterization for hierarchical parameters aka the classic “Matt trick”; I use this all the time. Less commonly, I’ve also seen non-centering used in a non-hierarchical context. My conceptual understanding of this second usage is that it involves shifting and scaling a primitive parameter by the (known) approximate posterior mean and SD of its unscaled analog to make it closer to N(0,1) and improve adaptation of the mass matrix. I’ve never done this before, but now I have a use case with a few additional wrinkles and I’d appreciate some advice on whether I’m thinking about this right or (more likely) not.
A cartoon of the relevant portions of the model I’m fitting looks something like this (backstory here if anyone’s really interested, but it’s not necessary):
data {
int<lower=1> N; // sample size
vector<lower=0>[N] S; // observed spawners
vector<lower=0>[N] C; // known catch
real<lower=0> sigma; // penalty scale
}
parameters {
vector<lower=0,upper=1>[N] h; // catch rate
}
transformed parameters {
vector<lower=0>[N] S_hat; // predicted spawners
// do some stuff that includes using h and calculating S_hat
}
model {
// likelihood (penalty)
C ~ lognormal(log(S_hat) + logit(h), sigma);
}
The likelihood statement is really just a “penalty” to force \log(\widehat{C}) = \log(\widehat{S}) + \text{logit}(h) to be as close to \log(C) as possible. (I should note that \widehat{S} is constrained by a hierarchical state-space model as well as its own observation model, so this is not nonidentifiable as it would be if \widehat{S} and h were completely free.) The problem is that as I make \sigma smaller, especially < 0.05 or so, sampling starts to slow down dramatically, presumably because of (lack of) adaptation to the corresponding narrow posterior contours of h. This seems like a good case for non-centering since I know the posterior should be close to \text{logit}(h) \sim N(\log(C) - \log(\widehat{S}), \sigma), and I can approximate \widehat{S} with the observed S. So is the following what I want to do? (BTW, I’m not using the <offset, multiplier>
construct because vector arguments aren’t in the current rstan release yet.)
data {
int<lower=1> N; // sample size
vector<lower=0>[N] S; // observed spawners
vector<lower=0>[N] C; // known catch
real<lower=0> sigma; // penalty scale
}
transformed data {
vector[N] mu = log(C) - log(S); // approximate posterior mean of logit(h)
}
parameters {
vector[N] h_z; // Z-scored logit(h)
}
transformed parameters {
vector<lower=0,upper=1>[N] h = inv_logit(mu + sigma * h_z);
vector<lower=0>[N] S_hat; // predicted spawners
// do some stuff that includes using h and calculating S_hat
}
model {
// prior
// implies logit(h) ~ logistic(0,1) <=> h ~ uniform(0,1)
h_z ~ logistic(-mu/sigma, 1/sigma);
// likelihood (penalty)
C ~ lognormal(log(S_hat) + logit(h), sigma);
}
The prior on the primitive h_z
seems wrong, but I’m not sure how else to ensure h ~ uniform(0,1)
as in the original model. I could just specify that directly, or do logit(h) ~ logistic(0,1)
(or compute logit_h
in transformed parameters
), but then I’d need a Jacobian. In any case, putting the usual unit prior on the primitive parameters like h_z ~ logistic(0,1)
definitely does not give me the intended prior on h
. Do I have the affine transformation inverted or something?
Thanks for bearing with me!