Warning messages:
1: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

We should increase the number of iterations.
The checks indicating the run looks good. I don’t know, why rstan shows
a different message than cmdstanr.

get_bfmi(zrn_fit)
[1] 0.6352896 0.6315200

> check_hmc_diagnostics(zrn_fit)
Divergences:
0 of 1000 iterations ended with a divergence.
Tree depth:
0 of 1000 iterations saturated the maximum tree depth of 16.
Energy:
E-BFMI indicated no pathological behavior.

GET IT!! I am using 10,000 iterations and 4 chains to verify the stability of the results.If the results continue to be good, I will attempt to add random slopes to this model.

Here is my updates.
1.When I use 4 chains and run 20,000 iterations, the only warning I receive is ‘There were 20 divergent transitions after warmup,’ while other metrics like Rhat, BFMI, and ESS seem fine. Considering the large number of iterations in my sample, can I ignore this issue?
2.I tried to add random slopes for two groups, and it seems like there were no errors. I am currently waiting for the final results from a larger number of iterations, which will take some time.

3.There is an additional question. Since my experiment involves testing both internal and external validity, there may be two different sets of data, but they follow the same model. If the data is different, should I adjust the degrees of freedom or other parameters for the prior distribution?

First 20000 iterations quite a lot. Did you try to run only 2000 iterations? Stan requires much less iterations than Gibbs samplers.
You may tighten the priors or switch to a normal distribution, eg.

sigma_alphaadt ~ student_t(5, 0, 2);

change to

sigma_alphaadt ~ normal(0, 1.5);

You will find tons of posts in the forum about divergent transitions.

Random slopes and intercept can be correlated, thus a correlation matrix in between
makes sense. See the classic book Bayesian Data Analysis by Andrew Gelman.

The priors are the beliefs about your data a-priori. In cases where lots of data is available less tight
priors are needed. Imposing overly restrictive priors can lead to bias. Non-restrictive priors increases the difficulty for the sampler. From my experience, I’d go for non-restrictive priors first and have a look at diagnostic plots.

generated quantities{
real avgIndEffect1_1;
real avgIndEffect1_2;
real avgIndEffect2_1;
real avgIndEffect2_2;
real avgIndEffect3_1;
real avgIndEffect3_2;
real avgIndEffect4_1;
real avgIndEffect4_2;
avgIndEffect1_1=beta1aq * beta3adt+DC1[4,10];//for id
avgIndEffect1_2=beta1aq * beta3adt+DC2[4,10];//for topic
avgIndEffect2_1=beta1cr * beta4adt+DC1[5,11];
avgIndEffect2_2=beta1cr * beta4adt+DC2[5,11];
avgIndEffect3_1=beta2aq * beta3adt+DC1[7,10];
avgIndEffect3_2=beta2aq * beta3adt+DC2[7,10];
avgIndEffect4_1=beta2cr * beta4adt+DC1[8,11];
avgIndEffect4_2=beta2cr * beta4adt+DC2[8,11];

Hello Professor! When I introduced random slopes, all effects became non-significant (confidence intervals cross 0). I have posted the detailed code in a new thread and hope to receive a response.

Random slopes and intercept can be correlated, thus a correlation matrix in between
makes sense. See the classic book Bayesian Data Analysis by Andrew Gelman.

It seems that directly correlating random slopes with random intercepts is not effective in this version; none of the coefficients are significant. However, this issue does not arise in the initial centering parameterization version.

Because when gamma1 and gamma2 are non-significant than likely both they and mu
are non-significant. Also the web-page code doesn’t use mu. Thus you may omit mu and
and run the sampler again and check if some gamma parameters become relevant.

Why The effects unexpectedly became smaller after employing non-central parameterization？
Means centered parameterization version or non-centered parameterization version? So I need to remove mu_aq, mu_cr, mu_adt from the transformed parameters section.
And i am so confused why the significance of the effect is so different in the two versions of the model.

I tried removing mu, obtained mostly significant effects (similar to the case with only random intercept version). Does this imply that mu has an impact on gamma and I should delete it?
result with mu：

Now I see mu is covered by alphaaq, alphaaq, alphaadt, thus we remove it.
I think a more clear mathematical description and a better formatting would be beneficial
to avoid some of the confusions.