Large Rhat when predictors not standardized

ydurr · March 26, 2025, 2:20pm

Hi, first time brms user. I am trying to run a multilevel model with formula

log(yield) ~ Item+Year+ pesticides_tonnes +avg_temp+precipitation_mm + (1+Item| Area)

see subset of unstandardized data:

 Area,Item,Year,yield,pesticides_tonnes,avg_temp,precipitation_mm
Albania,Maize,2000,38811,565.82,16.67,994.6187
Albania,Potatoes,2000,141228,565.82,16.67,994.6187
Albania,Wheat,2000,30455,565.82,16.67,994.6187
Albania,Maize,2001,38135,628.79,16.59,1179.969
Albania,Potatoes,2001,148818,628.79,16.59,1179.969
Albania,Wheat,2001,28220,628.79,16.59,1179.969
Albania,Maize,2002,39460,691.75,16.47,1286.1083
Albania,Potatoes,2002,153868,691.75,16.47,1286.1083
Albania,Wheat,2002,31617,691.75,16.47,1286.1083
Albania,Maize,2004,46197,817.68,16.16,1518.797
Albania,Potatoes,2004,149346,817.68,16.16,1518.797
Albania,Wheat,2004,31130,817.68,16.16,1518.797
Albania,Maize,2005,45442,880.64,15.67,1505.5068
Albania,Potatoes,2005,167061,880.64,15.67,1505.5068
Albania,Wheat,2005,31582,880.64,15.67,1505.5068
France,Maize,2000,90766,97878,11.74,1206.9763
France,Potatoes,2000,395987,97878,11.74,1206.9763
France,Wheat,2000,71175,97878,11.74,1206.9763
France,Maize,2001,85711,99694,11.37,1115.8035
France,Potatoes,2001,374385,99694,11.37,1115.8035
France,Wheat,2001,66169,99694,11.37,1115.8035
France,Maize,2002,89751,82448,11.87,1139.3085
France,Potatoes,2002,423804,82448,11.87,1139.3085
France,Wheat,2002,74455,82448,11.87,1139.3085
France,Maize,2004,89907,76099,11.34,982.6951
France,Potatoes,2004,454162,76099,11.34,982.6951
France,Wheat,2004,75788,76099,11.34,982.6951
France,Maize,2005,82539,78265,11.55,935.4893
France,Potatoes,2005,422227,78265,11.55,935.4893
France,Wheat,2005,69887,78265,11.55,935.4893

I initially tried fitting the model fit_brms1 where the continuous predictors pesticides_tonnes, avg_temp and precipitation_mm are unstandardized which lead to large Rhat values.

Example of warning received:

Warning: There were 929 divergent transitions after warmup.
Warning: There were 3070 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10.
Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low.
Warning: Examine the pairs() plot to diagnose sampling problems
Warning: The largest R-hat is 4.18, indicating chains have not mixed.
Running the chains for more iterations may help.
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.

I tried:

increasing the number of iterations to 4000
setting adapt_delta to larger values like 0.9 and 0.95
setting prior to N(0,10)

but still obtained large R-hat values.

Finally, I decided to standardize the continuous predictors on fit_brms2 and the model converges with R-hat values equal to 1.

How can I address the issue with the unstandardized predictors?

Thanks

code and summary of fit below

Code:

priors <- c(
  prior(normal(0, 1), class = "b")
)

fit_brm1 <- brm(
log(yield) ~ Year+ pesticides_tonnes +avg_temp+precipitation_mm + (1+Item| Area), data=df_yield_eu_train ,cores=4,iter = 2000 ,prior = priors)

fit_brm2 <- brm(
log(yield) ~ Year+ pesticides_tonnes +avg_temp+precipitation_mm + (1+Item| Area) , data=df_yield_eu_nrml_train,cores=4,iter = 2000 ,prior = priors)

End Code

Summary fit_brms1:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(yield) ~ Area + Item + Year + pesticides_tonnes + avg_temp + precipitation_mm + (1 + Item | Area) - Area 
   Data: df_yield_eu_train (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Area (Number of levels: 11) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   4.12      4.07     0.15     9.60 3.39        4       16
sd(ItemPotatoes)                0.12      0.10     0.01     0.30 3.28        4       11
sd(ItemWheat)                   0.46      0.46     0.02     1.28 3.80        4       12
cor(Intercept,ItemPotatoes)     0.39      0.32    -0.01     0.88 3.21        4       11
cor(Intercept,ItemWheat)        0.04      0.88    -1.00     0.92 2.83        5       28
cor(ItemPotatoes,ItemWheat)    -0.10      0.68    -0.91     0.99 2.97        5       11

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           -17.05     10.36   -33.01     3.47 1.93        6       20
ItemPotatoes          1.34      0.10     1.23     1.53 2.60        5       11
ItemWheat             0.18      0.72    -0.50     1.39 3.05        5       14
Year                  0.01      0.00     0.00     0.02 1.54        7       32
pesticides_tonnes     0.00      0.00    -0.00     0.00 1.80        6       12
avg_temp             -0.03      0.05    -0.11     0.03 3.05        5       18
precipitation_mm      0.00      0.00     0.00     0.00 2.32        5       16

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.31      0.11     0.22     0.51 2.63        5       28

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Summary fit_brms2:

Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(yield) ~ Area + Item + Year + pesticides_tonnes + avg_temp + precipitation_mm + (1 + Item | Area) - Area 
   Data: df_yield_eu_nrml_train (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Area (Number of levels: 11) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   0.53      0.14     0.33     0.89 1.00     1515     2002
sd(ItemPotatoes)                0.28      0.07     0.18     0.45 1.00     1780     2615
sd(ItemWheat)                   0.48      0.11     0.31     0.75 1.00     2006     2476
cor(Intercept,ItemPotatoes)    -0.20      0.28    -0.70     0.38 1.00     1806     1769
cor(Intercept,ItemWheat)       -0.47      0.23    -0.82     0.08 1.00     2332     2411
cor(ItemPotatoes,ItemWheat)     0.68      0.18     0.21     0.92 1.00     2529     2616

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           -14.28      2.89   -19.82    -8.60 1.00     4104     2764
ItemPotatoes          1.27      0.09     1.09     1.43 1.00     1629     2222
ItemWheat            -0.45      0.15    -0.74    -0.15 1.00     1660     2179
Year                  0.01      0.00     0.01     0.02 1.00     4107     2820
pesticides_tonnes     0.03      0.03    -0.03     0.10 1.00     5421     3097
avg_temp             -0.17      0.04    -0.25    -0.09 1.00     5710     2767
precipitation_mm      0.07      0.02     0.03     0.10 1.00     6503     2954

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.17      0.01     0.16     0.18 1.00     7227     2679

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

EDIT (Aki): added some ticks for easier reading of code and tables

amynang · March 26, 2025, 3:15pm

How about mean-centering them without scaling?

Edit: What is the reason you want to avoid standardization?

If you do this inside the formula,

log(yield) ~ Year+ pesticides_tonnes +scale(avg_temp)+scale(precipitation_mm) + ...

brms should give you everything backtransformed.

ydurr · March 26, 2025, 3:35pm

It seems maybe pesticides_tonnes is causing some issues. I ran the model with mean-centering with and without pesticides_tonnes.

Without pesticides_tonnes:
I obtained Rhat of 1 but warnings with max tree-depth and low ESS which can probably be fixed with more iterations.

With pesticides_tonnes:
I obtained large Rhat values of 4.

I’m not against standardization, I was just curious why the chains don’t mix with the unstandardized predictors. Is it maybe something to do with the prior I set?

I am going to try running the unstandardized model without pesticides_tonnes to see if it changes anything.

edit:
Indeed, without pesticides_tonnes, the unstandardized predictor model obtains rhat value of 1, while giving warning about low ESS and reaching treedepth.

amynang · March 26, 2025, 3:47pm

Hm I totally ignored your prior. Why did you choose this prior? What happens if you run the model without it?

ydurr · March 26, 2025, 4:09pm

I initially didn’t specify a prior and only added it after looking at the diagnostic warnings page, which suggested setting a stronger prior.

The standardized model works with/without setting any priors.

The centered-mean and unstandardized work with/without setting prior when pesticides_tonnes is not included.

mitzimorris · March 26, 2025, 4:22pm

If pesticides_tonnes is on a very different scale than the rest of your predictors, standardizing the predictor values is necessary. by default, BRMS mean-centers the predictors, but doesn’t standardize them. both make the sampler’s job a whole lot easier - higher effective sample sizes, faster fits.

see the section in the Stan User’s Guide on standardizing predictors.

ydurr · March 26, 2025, 4:30pm

Thank you, that is probably the issue:

	pesticides_tonnes	avg_temp	precipitation_mm
min	121	6.66	475.23853
max	114695	18.82	2026.3944
mean	15034.91305	11.72096257	957.1739762
median	7871.1	10.64	867.78735
sd	25194.49011	3.395233971	308.7355052

Pesticide is much larger than the remaining predictors.

avehtari · March 26, 2025, 7:05pm

With unnormalized predictors Bulk_ESS is around 4-7, which is clear indication that chains are stuck in different modes. Looking at the posterior for fit_brms1, we see that the varying intercept has expected sd of 4

ydurr:

Multilevel Hyperparameters:
~Area (Number of levels: 11) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   4.12      4.07     0.15     9.60 3.39        4       16

As the target is log(yield) it means that the intercepts are likely to explain most of the variation in the data, and chains are stuck in these different local modes where each chain has different varying intercepts.

With normalized data, sd of varying intercepts is more reasonable

ydurr:

Multilevel Hyperparameters:
~Area (Number of levels: 11) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   0.53      0.14     0.33     0.89 1.00     1515     2002

Topic		Replies	Views
Simple non-linear multilevel model gone awry brms	6	981	February 11, 2019
Want to confirm my data is in the correct structure for brms regression given output Modeling brms	1	285	January 10, 2024
Problemas de convergência em brms Modeling brms	1	45	December 13, 2024
Want to confirm my data is in the correct structure Modeling brms	2	270	January 22, 2024
Building Hierarchical Model with Long Format Data (brms / rstan) brms techniques , specification	1	885	September 5, 2018

Large Rhat when predictors not standardized

Related topics