Binomial with over 700 parameters to be estimated

Andrew, I switched the code following your example but the code file shows a warning on the line I inserted it in the generated quantities block (white x on round red background):
“No matches for: Available argument signatures for operator+: Expression is ill formed.”

I’m guessing that the error included a message like 'no signature for real + int[]', or something similar?

Try:

real<lower=0,upper=1> theta[J] = beta_rng(to_vector(y) + alpha,
                                          to_vector(n) - to_vector(y) + beta);
1 Like

Hi Mike, sorry for the very delayed response and thanks so much again for your help.

I get the same errors with a smaller dataset. But I wonder if the error messages are actually not a large validity concern. Note, when I increased iterations from 10,000 to 20,000 after warm up, I had considerable better convergence. With 20,000 I got 200 (28% of the 727 lakes) with Rhat ≤ 1.05 instead of 41 lakes with 10,000 iterations, 457 (63%) instead of 224 with Rhat ≤ 1.1, and 270 (37%) instead of 503 lakes with Rhat > 1.1.

Since there are so many thetas and the data is somewhat sparse for some of these lakes with only a few flights going in, I believe the model may be as good as it gets. Is there reason to throw out the model entirely? Or could we interpret the results by saying the 200 lakes with Rhat ≤ 1.05 are “trust worthy” results, the estimates for the 457 lakes are “more or less trustworthy” and the 270 remaining lakes with Rhat >1.1 are potentially problematic.

Btw, here is the pairsplot with just one of the 727 lakes. Are there any issues you see? I never interpreted these and don’t know what to look for. But, I noticed, with 10,000 iterations that some of the pairs had some bimodality that went away with 20,000 iterations.
pairsplot

Unfortunately I really don’t think it’s safe to ignore the Rhat warnings generally. They indicate that the different chains ended up exploring measurably different areas of the parameter space, so all computations on the resulting samples have no guarantee of accuracy. There’s even been talk of decreasing the Rhat value that triggers the warning to something even smaller like 1.01 to really ensure folks only trust samples from truly converged chains.

Maybe you need to step back and reconsider this model for the data. You should check out @betanalpha’s workflow case study and do those same steps yourself.

There are some visible “loops” indicating very slow mixing. Max tree depth warning also indicates very small step size and thus many many steps are needed before u-turn. The figure you attached is too small to see anything else (like axis labels). Can you post a bigger figure?

1 Like

Thanks so much Aki for looking at this. Hopefully this image will work.

I don’t still see the problem, but this are rare type of example with very slow mixing.
To better see the geometry can you plot the pairs plot in the unconstrained space (where the sampling happens), that is for kappa plot log(kappa-0.1), for theta and lambda plot logit(theta) and logit(lambda)?

Not sure if this is correct to get the transformations you want
pairs_plot2 <- mcmc_pairs(posterior_cp, np = np_cp, pars = c("alpha","beta","logit(lambda)","log(kappa,-0.1)","logit(theta[1])"), off_diag_args = list(size = 0.75))

How does the plot with that call look like? My current guess is that with the current priors you have way too much prior mass on very small theta values, which in logit space means that the tail is thick and it takes forever to explore that tail. We could see that better when looking the posterior in the unconstrained space and the way to fix it would be to use slightly more informative prior.

The pairs plots of theta[1] against the other parameters indicate that something in the model is being terribly identified. See https://betanalpha.github.io/assets/case_studies/identifiability.html for recommendations on how to access the unconstrained values as well as some explanation for why extremely degenerate models like this will appear to be multimodal for if the Markov chains aren’t long enough.

1 Like

Does that give you an error? Unless you’ve created variables with those names, you need pars to be the actual parameter names and to then use the transformations argument to provide the functions. The way you currently have it it will look for a parameter named “logit(lambda)” rather than applying the logit transformation to “lambda”. Something like this should work:

pairs_plot2 <-
  mcmc_pairs(
    posterior_cp,
    np = np_cp,
    pars = c("alpha", "beta", "lambda", "kappa", "theta[1]"),
    transformations = list(
      "lambda" = "qlogis",  # or define a function called logit
      "kappa" = function(x) log(x - 0.1),
      "theta[1]" = "qlogis"
    ),
    off_diag_args = list(size = 0.75)
  )

Alternatively, you can do what @betanalpha suggested and just extract and plot the already unconstrained draws themselves.

The above code for the pairs_plot gave me an error, so I used @jonah’s suggestion specifying the transformation argument. The resulting pairs_plot is exactly the same. Hmm?
If I undersand correctly, following @betanalpha suggestion would mean looking at the diagnostic_file which unfortunately, I did not specify. The model ran for four days on 8 cores. Happy to rerun it though or just use fewer iterations.

If the model takes that long to run then it will be much faster to create the pairs scatter plots one by one, and imposing the unconstraining transformations, by hand.

To double check that the transformations argument is working can you try this simple example:

x <- example_mcmc_draws(params = 4)

# plot with no transformations
mcmc_pairs(x, pars = c("alpha", "beta[1]", "sigma"))

# specify transformations using all allowed methods (string naming function, function definition, function object)
mcmc_pairs(x,
  pars = c("alpha", "beta[1]", "sigma"),
  transformations = list(
    "alpha" = "exp",
    "beta[1]" = function(x) exp(x),
    "sigma" = exp
  )
)

When I run that I get two very different looking pairs plots. Do you?

Also, if you want to be really sure that in. your case you’re plotting the transformed variables you can just transform the variables before passing them to bayesplot. That is, add columns to your posterior_cp that contain the transformed versions and then select those when plotting.

Thanks @jonah. Very helpful. I had a typo in the previous code. Also, your example resulted in two different looking pairs plots, so the transformation is working.

@avehtari here is the updated pairs plot for the unconstrained space.

If I look at this with my untrained eye and learning from @betanalpha posted link (I admit, it’s a bit over my head), I see some degeneracy coming from the hyperprior (kappa especially). Is that correct? @betanalpha like @avehtari above point towards dealing with this first through stronger prior information. I don’t have another dataset to use for a more informative prior. Should I just use a different prior alltogether? My hunch is no, because it fits well here with many flight counts being small and very few being large. If I keep the pareto for kappa and increase the shape parameter from 1.5 to a larger value that would get the pareto down, especially making its tail thinner? If so, what should I pick for shape parameter value? Not sure if I am on the right track and very much appreciate all your help.

When a Markov chain has exploded the entire posterior distribution enough then the sequential structure of the samples should wash out, even when the autocorrelations are very large. Here that structure hasn’t washed out and in many of the plots you can see the curling paths indicative of random walk behavior. In other words your Markov chains are exploring so slowly that you probably aren’t seeing enough of the posterior to develop an informed hypothesis of what could be going wrong.

You could look at the sampler configuration from get_adaptation_info(fit) to see if the step sizes or inverse metric elements are extreme. One possibility is that the adaptation procedure is being stressed and the adapter sampler configuration is suboptimal. This can happen when your the tails in which the Markov chains are initialized are nasty.

The attached txt file contains the console output for get_adaptation_info(fit).
console.txt (26.5 KB)

What would an extreme sampler configuration look like?

@betanalpha how would I determine whether the step sizes are extreme, etc.? See the attached console output for get_adaptation_info(fit) from above. Thanks so much for your help!

The behaviors to observe here are
1 - the step size is relatively small, although not necessarily disastrously slow.
2 - the inverse metric elements, which are estimators of the unconstrained parameter variances, vary by many orders of magnitude from 1e-3 to 1e2. The adaptation is trying to accommodate for that variation, but doing so really stresses the sampler during warmup.
3 - the inverse metric elements also vary a bit between chains, although nothing that’s inconsistent with statistical variation given how autocorrelated your chains are.

The problem with trying to diagnose a model with large is that there’s no easy to way to diagnose the source of the pathology, at least not without knowing more about the model and the data. You might have some success trying some of the strategies I discuss https://betanalpha.github.io/assets/case_studies/identifiability.html.

There is visible bimodality in qlogis(theta[1]). One thing to consider is, whether values qlogis(theta[1]))<-10 are sensible. I suspect that this the region causing part of the challenge. Even if it would not be sensible prior, you could check if qlogis(theta[1]))<-10 region is problematic by adding a prior making those values very unlikely and then you could see if the sampling is more efficient in the region of larger qlogis(theta[1])) values.