Binomial with over 700 parameters to be estimated

To double check that the transformations argument is working can you try this simple example:

x <- example_mcmc_draws(params = 4)

# plot with no transformations
mcmc_pairs(x, pars = c("alpha", "beta[1]", "sigma"))

# specify transformations using all allowed methods (string naming function, function definition, function object)
mcmc_pairs(x,
  pars = c("alpha", "beta[1]", "sigma"),
  transformations = list(
    "alpha" = "exp",
    "beta[1]" = function(x) exp(x),
    "sigma" = exp
  )
)

When I run that I get two very different looking pairs plots. Do you?

Also, if you want to be really sure that in. your case you’re plotting the transformed variables you can just transform the variables before passing them to bayesplot. That is, add columns to your posterior_cp that contain the transformed versions and then select those when plotting.

Thanks @jonah. Very helpful. I had a typo in the previous code. Also, your example resulted in two different looking pairs plots, so the transformation is working.

@avehtari here is the updated pairs plot for the unconstrained space.

If I look at this with my untrained eye and learning from @betanalpha posted link (I admit, it’s a bit over my head), I see some degeneracy coming from the hyperprior (kappa especially). Is that correct? @betanalpha like @avehtari above point towards dealing with this first through stronger prior information. I don’t have another dataset to use for a more informative prior. Should I just use a different prior alltogether? My hunch is no, because it fits well here with many flight counts being small and very few being large. If I keep the pareto for kappa and increase the shape parameter from 1.5 to a larger value that would get the pareto down, especially making its tail thinner? If so, what should I pick for shape parameter value? Not sure if I am on the right track and very much appreciate all your help.

When a Markov chain has exploded the entire posterior distribution enough then the sequential structure of the samples should wash out, even when the autocorrelations are very large. Here that structure hasn’t washed out and in many of the plots you can see the curling paths indicative of random walk behavior. In other words your Markov chains are exploring so slowly that you probably aren’t seeing enough of the posterior to develop an informed hypothesis of what could be going wrong.

You could look at the sampler configuration from get_adaptation_info(fit) to see if the step sizes or inverse metric elements are extreme. One possibility is that the adaptation procedure is being stressed and the adapter sampler configuration is suboptimal. This can happen when your the tails in which the Markov chains are initialized are nasty.

The attached txt file contains the console output for get_adaptation_info(fit).
console.txt (26.5 KB)

What would an extreme sampler configuration look like?

@betanalpha how would I determine whether the step sizes are extreme, etc.? See the attached console output for get_adaptation_info(fit) from above. Thanks so much for your help!

The behaviors to observe here are
1 - the step size is relatively small, although not necessarily disastrously slow.
2 - the inverse metric elements, which are estimators of the unconstrained parameter variances, vary by many orders of magnitude from 1e-3 to 1e2. The adaptation is trying to accommodate for that variation, but doing so really stresses the sampler during warmup.
3 - the inverse metric elements also vary a bit between chains, although nothing that’s inconsistent with statistical variation given how autocorrelated your chains are.

The problem with trying to diagnose a model with large is that there’s no easy to way to diagnose the source of the pathology, at least not without knowing more about the model and the data. You might have some success trying some of the strategies I discuss https://betanalpha.github.io/assets/case_studies/identifiability.html.

There is visible bimodality in qlogis(theta[1]). One thing to consider is, whether values qlogis(theta[1]))<-10 are sensible. I suspect that this the region causing part of the challenge. Even if it would not be sensible prior, you could check if qlogis(theta[1]))<-10 region is problematic by adding a prior making those values very unlikely and then you could see if the sampling is more efficient in the region of larger qlogis(theta[1])) values.