Gamma GLM estimates showing opposite direction to conditional estimates: why and which to report?

Hello everyone,

I am currently fitting a Gamma GLM on some artificial data. The response variable is the biomass values of fish, and the explanatory variables are Enforcement (an ordered factor), coral cover (continuous, ranging from 0 to 1), and habitat (a factor with two levels).

I am encountering an issue where the estimates from the model indicate the opposite direction of the conditional estimates. The predicted values match my expectations (as shown in the figure), where higher enforcement levels correspond to higher biomass values, but the estimates are negative.

> summary(reef_glm)
 Family: gamma 
  Links: mu = log; shape = identity 
Formula: biomass_fish ~ coral_cover:habitat + enforcement + habitat 
   Data: reef (Number of observations: 387) 
  Draws: 3 chains, each with iter = 10000; warmup = 4000; thin = 1;
         total post-warmup draws = 18000

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                      1.33      0.26     0.81     1.84 1.00    12547    12273
enforcement.L                  0.83      0.07     0.70     0.96 1.00    15739    12768
enforcement.Q                  0.20      0.07     0.06     0.34 1.00    14039    13048
enforcement.C                 -0.44      0.07    -0.57    -0.31 1.00    14452    12850
enforcementE4                 -0.22      0.08    -0.37    -0.07 1.00    15555    11882
habitatShallow                -1.41      0.37    -2.13    -0.70 1.00     9978    10821
habitatDeep:coral_cover       -0.96      0.38    -1.69    -0.22 1.00    12544    12160
habitatShallow:coral_cover    -0.28      0.42    -1.10     0.56 1.00    12482    12522

My question is twofold:

  1. Why are the conditional estimates accurate, but the estimates are showing the opposite direction?
  2. Should I report the estimates in my research paper, or is it sufficient to report the conditional estimates?

Thank you for your help!

Here is the data:
artificial_reef.csv (14.3 KB)

And below is the code:

# GLM for reef data

# packages ----------------------------------------------------------------

library(tidyverse)       # data wrangling; ggplot
library(brms)            # brm() and the rest of Bayesian functions


# data --------------------------------------------------------------------

reef <- read_csv("artificial_reef.csv") |>
  mutate(enforcement = factor(enforcement,
                              levels = c("No", "Low", "Medium_South",
                                         "Medium_North", "High"),
                              ordered = TRUE)) |>
  glimpse()


# exploration -------------------------------------------------------------

# boxplot
reef |> 
  ggplot(aes(enforcement, biomass_fish)) +
  geom_boxplot(outlier.colour = NA) +
  ylim(NA, 6) +
  theme_minimal()

# data summary
reef |> 
  group_by(enforcement) |> 
  summarise(means = mean(biomass_fish),
            medians = median(biomass_fish)) |> 
  # do other Enforcement levels have lower biomass
  mutate(means_dif = means - first(means),
         medians_dif = medians - first(medians)) |> 
  select(enforcement, means, means_dif, medians, medians_dif)


# model -------------------------------------------------------------------

# GLM (linear model)
reef_glm <- brm(biomass_fish ~ coral_cover:habitat + enforcement + habitat,
                family = Gamma(link = "log"),
                data = reef,
                cores = 3,
                chains = 3, iter = 10000, warmup = 4000,
                file = "brms_reef_glm.RDS")

summary(reef_glm)

conditional_effects(reef_glm, "enforcement")
  • R version 4.2.2 (2022-10-31 ucrt)
  • Platform: x86_64-w64-mingw32/x64 (64-bit)
  • Running under: Windows 10 x64 (build 22621)
  • brms Version: brms_2.18.0
1 Like

This is a good question, and I didn’t know the answer right away. Turns out that lm() in R uses what to me is a fairly exotic coding of the design matrix when it sees ordered factors as covariates; namely it uses so-called polynomial contrasts. A reasonable-seeming summary of the details is here Understanding Ordered Factors in a Linear Model | University of Virginia Library Research Data Services + Sciences

If you want coefficients that are interpretable in a more familiar way, just change the factor encoding such that it’s unordered. If it’s important to you to treat the factor as ordered, then you probably want to look into monotonic effects in brms (see brms::mo).

2 Likes

Hello @Yunier, actually I don’t think there’s anything wrong with the estimates. I think the use of an ordered factor is generating the confusion here. Because you’ve not implemented this as an ordinal predictor, it’s just a nominal categorical predictor but with the parameters renamed, and it looks like they are not returned in the original order (which seems a bit strange, I don’t think I’ve noticed this before but I tend not to use ordered factors much).

I reran it and observed the same behavior.

The simple categorical model with the variable not ordered:

 Family: gamma 
  Links: mu = log; shape = identity 
Formula: biomass_fish ~ coral_cover:habitat + enforcement + habitat 
   Data: artificial_reef (Number of observations: 387) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                      1.80      0.27     1.27     2.31 1.00     2295     2640
enforcementLow                -0.95      0.10    -1.14    -0.76 1.00     3118     2717
enforcementMedium_North        0.13      0.06     0.02     0.24 1.00     2867     2681
enforcementMedium_South       -0.73      0.09    -0.91    -0.55 1.00     3339     2831
enforcementNo                 -0.77      0.09    -0.95    -0.58 1.00     3354     2933
habitatShallow                -1.41      0.36    -2.14    -0.71 1.00     2175     2157
habitatDeep:coral_cover       -0.97      0.37    -1.68    -0.23 1.00     2433     2384
habitatShallow:coral_cover    -0.28      0.43    -1.10     0.55 1.00     2578     2615

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     4.99      0.35     4.34     5.71 1.00     3405     2558

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The same thing but using the ordered factor:

 Family: gamma 
  Links: mu = log; shape = identity 
Formula: biomass_fish ~ coral_cover:habitat + enforcement_ord + habitat 
   Data: artificial_reef (Number of observations: 387) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                      1.33      0.26     0.82     1.84 1.00     2980     2762
enforcement_ord.L              0.83      0.07     0.70     0.96 1.00     3618     2960
enforcement_ord.Q              0.20      0.07     0.06     0.33 1.00     3268     3050
enforcement_ord.C             -0.44      0.06    -0.57    -0.31 1.00     2837     2834
enforcement_ordE4             -0.22      0.07    -0.37    -0.08 1.00     3672     2810
habitatShallow                -1.40      0.36    -2.14    -0.71 1.00     2317     2321
habitatDeep:coral_cover       -0.96      0.37    -1.68    -0.22 1.00     3042     3032
habitatShallow:coral_cover    -0.29      0.42    -1.07     0.56 1.00     2795     2665

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     4.99      0.35     4.34     5.70 1.00     3366     3133

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The ordinal predictor model:

 Family: gamma 
  Links: mu = log; shape = identity 
Formula: biomass_fish ~ coral_cover:habitat + mo(enforcement_ord) + habitat 
   Data: artificial_reef (Number of observations: 387) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                      1.14      0.26     0.65     1.64 1.00     2762     2701
habitatShallow                -1.58      0.36    -2.29    -0.87 1.00     2402     2467
habitatDeep:coral_cover       -1.28      0.36    -1.99    -0.59 1.00     2873     2765
habitatShallow:coral_cover    -0.39      0.43    -1.20     0.47 1.00     2951     2694
moenforcement_ord              0.24      0.02     0.20     0.27 1.00     2242     2373

Simplex Parameters: 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moenforcement_ord1[1]     0.05      0.04     0.00     0.14 1.00     2675     1661
moenforcement_ord1[2]     0.13      0.08     0.01     0.31 1.00     2134     1105
moenforcement_ord1[3]     0.80      0.09     0.63     0.95 1.00     2662     1903
moenforcement_ord1[4]     0.02      0.02     0.00     0.07 1.00     2528     1793

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     4.88      0.35     4.25     5.59 1.00     3611     2681

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

A side note is that (presuming that these are simulated data) there’s a problem with the implementation which has resulted in the same value of biomass_fish for every sample.

3 Likes

Ha, looks like we were responding at the same time!

I didn’t realise that there was an actual method for ordinal predictors by lm(), but it seems of dubious value.

2 Likes

Thanks!

This data follows a similar pattern to the original. I believe this could be due to the fact that biomass values repeat 10 times per each combination of Site:Habitat. This is because the sampling units for fishes were 50 m long and were aggregated, while the sampling units for corals were 10 units 10 m long. We assumed that fish could move easily around the entire sampling area, so this difference in sampling unit size might have resulted in repeated values when the data were aggregated.

Thanks! I will check the blog post

I agree with you. The value, such as it is, is that you can quickly assess (and test hypotheses about, and put priors on) certain flavors of regular trend across the ordinal levels, while preserving the ability to estimate completely independent values for all of the levels. Ultimately, this little piece of flexibility can always be avoided by passing un-ordered factors.

1 Like