I am looking at the determinants of Korean presidents’ diplomatic visits between 1948 and 2021.

Is it overkill to have the following three-level multilevel model, should I have democracy as an explanatory variable?

I have a panel country-year data.

The model is:

visit ~ x + ( 1 | era / president / year) + (1 | geopolitical / country)

where era is post-Cold-War vs. pre-Cold-War,
president is the president in office,
year is the year of the visit (or no visit).

geopolitical is regional groupings of countries.
country is each individual country.

Or would a simpler model be better? (I believe McElreath suggests multilevel model in these kind of nested categorical variables in his Rethinking Statistics book).

visit ~ x + era + geopolitical + (president / year) + (1 | country)

Or should I compare the two models to decide?

Final question: can someone point me towards best practices/ vignettes/ blogs (e.g.)/ articles regarding how I can present the differences in intercepts for these kind of nested models.

Or what about any number of other models using various formulations of the above variables?

I don’t think it is quite possible to answer these questions unless someone is a subject matter expert and knows the questions wished to be answered. But perhaps you are, so maybe you can answer best! I think McElreath has most recently (2023 lectures) motivated the multilevel modeling approach mainly as a way to attempt to adjust for unmeasured confounds across groups (see lectures 12, 13, and 14 Richard McElreath - YouTube of the 2023 lectures). So if you are working from that perspective, you might draw out a causal diagram for the variables that you give above to motivate your model choice. From the predictive perspective, multilevel models provide regularization, which generally results in better predictive performance. You could compare different models via LOO to try to find the best predictive model; see Cross-validation FAQ for LOO on multilevel models and part 9 just below for time series.
The difference in your two models above are that in the first model you have nested (hierarchical) varying intercepts for all variables (except x) but in the second model you don’t include any partial pooling for era and geopolitical - you assume no pooling (no information is shared between levels of era or levels of geopolitical).

I’m not sure of a resource, but one thing you could do is to make predictions for visit for the different combinations of grouping variables that you are interested in and present them in a plot with uncertainty intervals.

Here are my 5 cents: if a predictor like “era” only has a few possible values (here: 2), there is likely too little information for partial pooling (random effects) to be useful. Therefore, I would always include a predictor like era as a fixed effect. You could ask the same about the predictor “president”. How many have there been (not many, I imagine) and how many would you need to reliably estimate the standard deviation of a random effect? I don’t have a definitive answer, but if ht en umber of presidents is < 5, I would not use that predictor as a group-level variable bit simply a fixed effect.

I don’t think that heuristic always applies. See the McElreath lectures I linked or Hierarchical Modeling

I forgot that era was simply a dummy variable, so that would likely be a population-level effect. As for the others, I really think it depends on the goal of analysis and the assumptions you want to make.