Large Rhat when predictors not standardized

Hi, first time brms user. I am trying to run a multilevel model with formula

log(yield) ~ Item+Year+ pesticides_tonnes +avg_temp+precipitation_mm + (1+Item| Area)

see subset of unstandardized data:

 Area,Item,Year,yield,pesticides_tonnes,avg_temp,precipitation_mm
Albania,Maize,2000,38811,565.82,16.67,994.6187
Albania,Potatoes,2000,141228,565.82,16.67,994.6187
Albania,Wheat,2000,30455,565.82,16.67,994.6187
Albania,Maize,2001,38135,628.79,16.59,1179.969
Albania,Potatoes,2001,148818,628.79,16.59,1179.969
Albania,Wheat,2001,28220,628.79,16.59,1179.969
Albania,Maize,2002,39460,691.75,16.47,1286.1083
Albania,Potatoes,2002,153868,691.75,16.47,1286.1083
Albania,Wheat,2002,31617,691.75,16.47,1286.1083
Albania,Maize,2004,46197,817.68,16.16,1518.797
Albania,Potatoes,2004,149346,817.68,16.16,1518.797
Albania,Wheat,2004,31130,817.68,16.16,1518.797
Albania,Maize,2005,45442,880.64,15.67,1505.5068
Albania,Potatoes,2005,167061,880.64,15.67,1505.5068
Albania,Wheat,2005,31582,880.64,15.67,1505.5068
France,Maize,2000,90766,97878,11.74,1206.9763
France,Potatoes,2000,395987,97878,11.74,1206.9763
France,Wheat,2000,71175,97878,11.74,1206.9763
France,Maize,2001,85711,99694,11.37,1115.8035
France,Potatoes,2001,374385,99694,11.37,1115.8035
France,Wheat,2001,66169,99694,11.37,1115.8035
France,Maize,2002,89751,82448,11.87,1139.3085
France,Potatoes,2002,423804,82448,11.87,1139.3085
France,Wheat,2002,74455,82448,11.87,1139.3085
France,Maize,2004,89907,76099,11.34,982.6951
France,Potatoes,2004,454162,76099,11.34,982.6951
France,Wheat,2004,75788,76099,11.34,982.6951
France,Maize,2005,82539,78265,11.55,935.4893
France,Potatoes,2005,422227,78265,11.55,935.4893
France,Wheat,2005,69887,78265,11.55,935.4893

I initially tried fitting the model fit_brms1 where the continuous predictors pesticides_tonnes, avg_temp and precipitation_mm are unstandardized which lead to large Rhat values.

Example of warning received:

  • Warning: There were 929 divergent transitions after warmup.
  • Warning: There were 3070 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10.
  • Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low.
  • Warning: Examine the pairs() plot to diagnose sampling problems
  • Warning: The largest R-hat is 4.18, indicating chains have not mixed.
    Running the chains for more iterations may help.
  • Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.

I tried:

  • increasing the number of iterations to 4000
  • setting adapt_delta to larger values like 0.9 and 0.95
  • setting prior to N(0,10)

but still obtained large R-hat values.

Finally, I decided to standardize the continuous predictors on fit_brms2 and the model converges with R-hat values equal to 1.

How can I address the issue with the unstandardized predictors?

Thanks

code and summary of fit below

Code:

priors <- c(
  prior(normal(0, 1), class = "b")
)
fit_brm1 <- brm(
log(yield) ~ Year+ pesticides_tonnes +avg_temp+precipitation_mm + (1+Item| Area), data=df_yield_eu_train ,cores=4,iter = 2000 ,prior = priors)

fit_brm2 <- brm(
log(yield) ~ Year+ pesticides_tonnes +avg_temp+precipitation_mm + (1+Item| Area) , data=df_yield_eu_nrml_train,cores=4,iter = 2000 ,prior = priors)

End Code

Summary fit_brms1:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(yield) ~ Area + Item + Year + pesticides_tonnes + avg_temp + precipitation_mm + (1 + Item | Area) - Area 
   Data: df_yield_eu_train (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Area (Number of levels: 11) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   4.12      4.07     0.15     9.60 3.39        4       16
sd(ItemPotatoes)                0.12      0.10     0.01     0.30 3.28        4       11
sd(ItemWheat)                   0.46      0.46     0.02     1.28 3.80        4       12
cor(Intercept,ItemPotatoes)     0.39      0.32    -0.01     0.88 3.21        4       11
cor(Intercept,ItemWheat)        0.04      0.88    -1.00     0.92 2.83        5       28
cor(ItemPotatoes,ItemWheat)    -0.10      0.68    -0.91     0.99 2.97        5       11

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           -17.05     10.36   -33.01     3.47 1.93        6       20
ItemPotatoes          1.34      0.10     1.23     1.53 2.60        5       11
ItemWheat             0.18      0.72    -0.50     1.39 3.05        5       14
Year                  0.01      0.00     0.00     0.02 1.54        7       32
pesticides_tonnes     0.00      0.00    -0.00     0.00 1.80        6       12
avg_temp             -0.03      0.05    -0.11     0.03 3.05        5       18
precipitation_mm      0.00      0.00     0.00     0.00 2.32        5       16

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.31      0.11     0.22     0.51 2.63        5       28

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Summary fit_brms2:

Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(yield) ~ Area + Item + Year + pesticides_tonnes + avg_temp + precipitation_mm + (1 + Item | Area) - Area 
   Data: df_yield_eu_nrml_train (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Area (Number of levels: 11) 
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                   0.53      0.14     0.33     0.89 1.00     1515     2002
sd(ItemPotatoes)                0.28      0.07     0.18     0.45 1.00     1780     2615
sd(ItemWheat)                   0.48      0.11     0.31     0.75 1.00     2006     2476
cor(Intercept,ItemPotatoes)    -0.20      0.28    -0.70     0.38 1.00     1806     1769
cor(Intercept,ItemWheat)       -0.47      0.23    -0.82     0.08 1.00     2332     2411
cor(ItemPotatoes,ItemWheat)     0.68      0.18     0.21     0.92 1.00     2529     2616

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           -14.28      2.89   -19.82    -8.60 1.00     4104     2764
ItemPotatoes          1.27      0.09     1.09     1.43 1.00     1629     2222
ItemWheat            -0.45      0.15    -0.74    -0.15 1.00     1660     2179
Year                  0.01      0.00     0.01     0.02 1.00     4107     2820
pesticides_tonnes     0.03      0.03    -0.03     0.10 1.00     5421     3097
avg_temp             -0.17      0.04    -0.25    -0.09 1.00     5710     2767
precipitation_mm      0.07      0.02     0.03     0.10 1.00     6503     2954

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.17      0.01     0.16     0.18 1.00     7227     2679

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

EDIT (Aki): added some ticks for easier reading of code and tables

How about mean-centering them without scaling?

Edit: What is the reason you want to avoid standardization?

If you do this inside the formula,

log(yield) ~ Year+ pesticides_tonnes +scale(avg_temp)+scale(precipitation_mm) + ...

brms should give you everything backtransformed.

It seems maybe pesticides_tonnes is causing some issues. I ran the model with mean-centering with and without pesticides_tonnes.

Without pesticides_tonnes:
I obtained Rhat of 1 but warnings with max tree-depth and low ESS which can probably be fixed with more iterations.

With pesticides_tonnes:
I obtained large Rhat values of 4.

I’m not against standardization, I was just curious why the chains don’t mix with the unstandardized predictors. Is it maybe something to do with the prior I set?

I am going to try running the unstandardized model without pesticides_tonnes to see if it changes anything.

edit:
Indeed, without pesticides_tonnes, the unstandardized predictor model obtains rhat value of 1, while giving warning about low ESS and reaching treedepth.

Hm I totally ignored your prior. Why did you choose this prior? What happens if you run the model without it?

I initially didn’t specify a prior and only added it after looking at the diagnostic warnings page, which suggested setting a stronger prior.

The standardized model works with/without setting any priors.

The centered-mean and unstandardized work with/without setting prior when pesticides_tonnes is not included.

If pesticides_tonnes is on a very different scale than the rest of your predictors, standardizing the predictor values is necessary. by default, BRMS mean-centers the predictors, but doesn’t standardize them. both make the sampler’s job a whole lot easier - higher effective sample sizes, faster fits.

see the section in the Stan User’s Guide on standardizing predictors.

Thank you, that is probably the issue:

pesticides_tonnes avg_temp precipitation_mm
min 121 6.66 475.23853
max 114695 18.82 2026.3944
mean 15034.91305 11.72096257 957.1739762
median 7871.1 10.64 867.78735
sd 25194.49011 3.395233971 308.7355052

Pesticide is much larger than the remaining predictors.

With unnormalized predictors Bulk_ESS is around 4-7, which is clear indication that chains are stuck in different modes. Looking at the posterior for fit_brms1, we see that the varying intercept has expected sd of 4

As the target is log(yield) it means that the intercepts are likely to explain most of the variation in the data, and chains are stuck in these different local modes where each chain has different varying intercepts.

With normalized data, sd of varying intercepts is more reasonable

1 Like