Hello,
I have a dependent variable in my model that is moderately correlated (0.5) with the random intercept, and I’m wondering if there is a way to visualize or estimate how much of an effect this correlation has on my variable estimates, and on my model as a whole (how much of a problem would it be to leave both in the model?). I think this effect is called “bias”, so how would I estimate how much bias is in my random effect “s(fSite, bs = “re”)” when I include this covariate? I’m using the package mgcv in R.
Model without covariate in question “s(log_ratio_thal_halo)”:
> summary(mod_total)
Family: Negative Binomial(0.367)
Link function: log
Formula:
num ~ offset(log(area_sampled)) + te(CYR, Latitude, by = fSeason) +
fSeason + s(sal, bs = "ts") + s(DO, bs = "ts") + s(water_depth) +
s(total_sg) + s(fSite, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4234 0.2014 -16.997 < 2e-16 ***
fSeasonWET 0.6605 0.1945 3.397 0.000682 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
te(CYR,Latitude):fSeasonDRY 6.189e+00 7.875 16.363 0.03584 *
te(CYR,Latitude):fSeasonWET 1.283e+01 16.148 60.026 4.46e-06 ***
s(sal) 5.711e-04 9.000 0.000 0.42844
s(DO) 5.783e-01 9.000 1.378 0.13680
s(water_depth) 1.000e+00 1.001 0.000 0.99716
s(total_sg) 3.168e+00 3.995 16.760 0.00215 **
s(fSite) 3.192e+01 46.000 109.113 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.137 Deviance explained = 43.4%
-REML = 846.75 Scale est. = 1 n = 1453
Model with covariate in question:
> summary(mod_ratio)
Family: Negative Binomial(0.388)
Link function: log
Formula:
num ~ offset(log(area_sampled)) + te(CYR, Latitude, by = fSeason) +
fSeason + s(sal, bs = "ts") + s(DO, bs = "ts") + s(water_depth) +
s(total_sg) + s(log_ratio_thal_halo) + s(fSite, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4604 0.1844 -18.761 < 2e-16 ***
fSeasonWET 0.6796 0.1938 3.506 0.000455 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
te(CYR,Latitude):fSeasonDRY 7.397e+00 9.668 17.891 0.049484 *
te(CYR,Latitude):fSeasonWET 1.018e+01 12.815 51.548 4.72e-06 ***
s(sal) 2.685e-04 9.000 0.000 0.636458
s(DO) 6.051e-01 9.000 1.523 0.122697
s(water_depth) 1.000e+00 1.000 0.075 0.785044
s(total_sg) 3.320e+00 4.178 19.232 0.000904 ***
s(log_ratio_thal_halo) 5.095e+00 6.160 32.532 7.85e-06 ***
s(fSite) 2.735e+01 46.000 71.174 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.125 Deviance explained = 44.6%
-REML = 839.33 Scale est. = 1 n = 1453
Non-linear correlation (concurvity) is about 0.5 between the two:
> concurvity(mod_ratio, full = FALSE)$estimate[ , (ncol(concurvity(mod_ratio,
full = FALSE)$estimate) - 1):ncol(concurvity(mod_ratio, full = FALSE)$estimate)]
s(log_ratio_thal_halo) s(fSite)
para 1.050603e-23 0.021373062
te(CYR,Latitude):fSeasonDRY 1.329285e-01 0.040720710
te(CYR,Latitude):fSeasonWET 1.328086e-01 0.046022044
s(sal) 4.620769e-02 0.011098179
s(DO) 1.332674e-02 0.006498958
s(water_depth) 7.796803e-03 0.010273864
s(total_sg) 1.501585e-02 0.011513171
s(log_ratio_thal_halo) 1.000000e+00 0.032523922
s(fSite) 4.931990e-01 1.000000000