Stochasticity in pareto-k diagnostics over different fitting runs

Here’s what I mean. You need also to have the other functions in gpdfit.R in context and nlist from helpers.R

gpdkpost <- function(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE) {
  # See section 4 of Zhang and Stephens (2009)
  if (sort_x) {
    x <- sort.int(x)
  }
  N <- length(x)
  prior <- 3
  M <- min_grid_pts + floor(sqrt(N))
  jj <- seq_len(M)
  xstar <- x[floor(N / 4 + 0.5)] # first quartile of sample
  theta <- 1 / x[N] + (1 - sqrt(M / (jj - 0.5))) / prior / xstar
  l_theta <- N * lx(theta, x) # profile log-lik
  ws <- 1 / vapply(jj, FUN.VALUE = numeric(1), FUN = function(j) {
    sum(exp(l_theta - l_theta[j]))
  })
  theta_hat <- sum(theta * ws)
  k <- mean.default(log1p(-theta_hat * x))
  ks <- numeric(length=M)
  for (i in 1:M) {
    ks[i] <- mean.default(log1p(-theta[i] * x))
  }
  sigma <- -k / theta_hat

  if (wip) {
    k <- adjust_k_wip(k, n = N)
  }

  if (is.nan(k)) {
    k <- Inf
  }

  nlist(k, sigma, ks, ws)
}

And demonstration using fake data

x <- sort(exp(rnorm(10000)*4),decreasing=TRUE)[1:100]
(foo <- gpdkpost(x))
qplot(q$ks,q$ws)+geom_line()+labs(x="k",y="unnormalized marginal posterior density")

image

Sorry, I don’t have now time to make it more user friendly.

1 Like