Good points all around, thank you Ben! I implemented the above but changed everything to be in terms of y= log(x), and I also fixed a typo (it should be q = coef2 + log_xmnu_std) but alas we still have the same old sampling problems.
I have now implemented a simpler version of the model, the Pareto-Lognormal, and done it entirely on the log(x), which is to say I wrote a Normal-halfLaplace model on the variable y from Reed and Jorgenson. I wrote the lpdf function by taking the log of equation 13. This model actually works in some regions of the parameter values! But it does not work when alpha is large. And in all cases there is a persistent weirdness in stan where it reports lots of divergent transitions (even when the model is fitting great) – and as before, raising adapt_delta makes the problem much worse!
functions {
real normalhalflaplace_lpdf(vector y, real alpha, real nu, real tau) {
int N = rows(y);
real log_alpha = log(alpha);
real tau_inv = inv(tau);
real coef1 = alpha * tau;
vector[N] lprob;
for(n in 1:N){
real y_std;
real p;
real log_Rp;
y_std <- tau_inv*(y[n] - nu);
p <- coef1 - y_std;
log_Rp <- log1m(Phi(p)) - std_normal_lpdf(p);
lprob[n] <- std_normal_lpdf(y_std) + log_Rp;
}
return sum(lprob) + N*log_alpha;
}
}
data {
int N;
vector[N] y;
}
parameters {
real<lower=0> alpha;
real nu;
real<lower=0> tau;
}
model {
// priors
alpha ~ cauchy(0,10);
nu ~ normal(0,10);
tau ~ cauchy(0,10);
// likelihood
y ~ normalhalflaplace(alpha, nu, tau);
}
(I didn’t use the log_kernel syntax purely because I’m too afraid I’ll introduce bugs due to my lack of familiarity with that structure.)
I now think this distribution actually has some problems that the authors haven’t fully copped to. Reading the section on estimation closely, I see: “Experience has shown that it is worthwhile examining the data for evidence of power-law behaviour in the tails. If for example there
is only power-law behaviour in one tail, attempts to fit the dPlN may result in
the non-convergence of the optimization algorithm”. That is to say on the full model if either beta or alpha is large, the distribution model doesn’t really work in some unspecified way. Well, by the same token, even in the simpler model here where beta is eliminated, a large alpha would imply that the whole model should actually just be Gaussian on y.
I suspect this is due to identification problems – I think the tail of a thin-tailed-Laplace is nearly or indeed completely indistinguishable from the tail of a Normal (in this paper’s notation, large alpha means a thin tail; alpha is defined as the reciprocal of sigma in the stan manual for the double exponential). By the same token, I’ve noticed that thinner-tailed Paretos can look quite similar to the tail of a very platykurtic Lognormal in practice (but I should note, serious econometricians have told me this shouldn’t be the case…).
I also noticed that the parameters of this model seem to be able to “disagree” with each other. For example, if alpha is large (thinner tails) fit gets worse as nu gets far away from zero, holding tau constant. In some sense here the two bits of the model are working against each other: nu says there should be mass far from zero, alpha says there shouldn’t be.
I hold out some hope that a fancy re-parameterisation using gaussian and exponential component draws to construct y in the model block might cure some of the divergent transitions, and I will attempt that now. [Edited to add: this was a foolish and also incoherent hope, based on the error of thinking you can reparameterise a data variable. Lol.]