Who tells the truth: GLMM or Bayesian?

Hi all,
Recently, I’m doing some analysis about the influence of biotic neighborhood on the survival of seedling. I have seedling survival data from 450 1×1m plot across 11 years.
The main question I want to resolve is:

  1. how different neighbor crowding( e.g. number of conspecific seedling, number of heterospecific seedling, the sum of basal area of conspecific adult, the sum of basal area of heterospecific adult) influence seedling survival?
  2. how functional trait can mediate species response different neighbor crowding effect?

When I ran the data based on the hierarchical Bayesian model, none of the above factors can have significant influence( with 95% intervals don’t overlap with 0). As people in my study area have found those factor do have influence, so I try the generalized linear mixed model (GLMM) based on lme4 package and stan_glmer() based on rstanrm package. Both methods give me a significant result.

So which one should I trust? I don’t know what happens. I’m wondering if anyone can give me a hand. Thank you so much!

Following is the hierarchical Bayesian model and two other methods I have tried.

  1. tree.all.yr.c.sp = glmer(survival ~ (1|year) + (1|code) + Shom + Shet + Acon + Ahet, family = binomial, data= tree.dat)

  2. rstanarm::stan_glmer(Status ~ (1|plot) + (1|census) + Shom + Shet + Acon + Ahet,
    data = seedling.comp, family = binomial, QR = TRUE,
    # this next line is only to keep the example small in size!
    chains = 3, cores = 3, seed = 12345, iter = 500)


data { 
    int<lower=0> nplot; // number of seedling plot 
    int<lower=0> nspe; // number of species 
    int<lower=0> ncensus;//number of census time
    int<lower=0> n; //number of survival record
    int<lower=0> ncov; //number of covariates
    int<lower=0> plot[n]; //plot name
    int<lower=0> spe[n];//species name
    int<lower=0> census[n];//census time, year
    vector[nspe] meanT; //functional trait
    matrix[n,ncov] COV; // 1. intercept; 2.conspecific seedling; 3.hetspecific seedling; 4.conspecific adult; 5. hetspecific adult
    int<lower=0,upper=1> status[n]; //status of each record
}

parameters {
    real bet[ncov]; 
    real b[nspe,ncov];
	real int1[ncov];
	real int2[nplot];	
	real int3[ncensus];
    real <lower=0.00001> sigmab[5]; 
    real <lower=0.00001> sigmaplot; 
    real <lower=0.00001> sigmacensus;
}

model {
sigmacensus~gamma(100,100); 
sigmaplot~gamma(100,100);

	for(i in 1:5)
	{
	bet[i]~normal(0,100); 
	int1[i]~normal(0,100); 
	sigmab[i]~gamma(100,100);  

	}

    for(i in 1:nplot)
    {
        int2[i]~normal(0,sigmaplot); 
    }
   
    for(i in 1:ncensus)
    {
        int3[i]~normal(0,sigmacensus); 
    }

    
    for(j in 1:nspe)
    {
        b[j,1]~normal(int1[1]+bet[1]*meanT[j],sigmab[1]); //bet[1] represent how functional trait related to seedling survival
		b[j,2]~normal(int1[2]+bet[2]*meanT[j],sigmab[2]); //bet[2] represent how trait mediated the effect of conspecific seedling 
        b[j,3]~normal(int1[3]+bet[3]*meanT[j],sigmab[3]); //bet[2] represent how trait mediated the effect of hetspecific seedling
        b[j,4]~normal(int1[4]+bet[4]*meanT[j],sigmab[4]); //bet[2] represent how trait mediated the effect of conspecific adult tree
        b[j,5]~normal(int1[5]+bet[5]*meanT[j],sigmab[5]); //bet[2] represent how trait mediated the effect of hetspecific adult tree
	

    }
    

    for(i in 1:n)
    status[i]~bernoulli_logit(b[spe[i],1]*COV[i,1]+b[spe[i],2]*COV[i,2]+b[spe[i],3]*COV[i,3]+b[spe[i],4]*COV[i,4]+b[spe[i],5]*COV[i,5]+int2[plot[i]]+int3[census[i]]);
    
} 
 


****

The stan_glmer() function won’t give you p-values for a test of a null hypothesis that some coefficient is zero. What do you mean?

Sorry, I didn’t make it clear. `stan_glmer() give the result that mean±SD didn’t overlap with 0. If my understand is right, this means the influence is significant.

If that margin of the posterior distribution were close to normal and zero was more than 2 standard deviations less than the posterior mean, then there would be negligible probability that the coefficient is negative under the model. But the normality cannot be taken for granted, it is not analogous to a point null hypothesis that the coefficient is zero, and you can compute the probability that the coefficient is negative under the model directly without using a 2 SD approximation.

Anyway, are you asking why your Stan code yields different results than that of stan_glmer()?

Hi Ben, Thank you for your reply.
What I want to know is when there are difference between different methods, which one shoudl I select?
The stan_glmer() function gave me the result:


mean sd 2.5% 25% 50% 75% 97.5%
(Intercept) -0.6 0.9 -2.2 -1.1 -0.6 -0.1 1.2
Shom 0.3 0.0 0.2 0.2 0.3 0.3 0.3
Shet 0.1 0.0 0.0 0.1 0.1 0.1 0.1
Acon -0.3 0.0 -0.3 -0.3 -0.3 -0.2 -0.2
Ahet -0.1 0.1 -0.2 -0.1 0.0 0.0 0.1


The GLMM model based on lme4 package gave the following result:


          Estimate Std. Error z value Pr(>|z|)    

(Intercept) -0.9142366 0.2954385 -3.095 0.00197 **
Shom 0.1171840 0.0380382 3.081 0.00207 **
Shet -0.0001754 0.0244176 -0.007 0.99427
Acon -0.3747855 0.0326565 -11.477 < 2e-16 ***
Ahet 0.0548720 0.0299040 1.835 0.06651 .


The hierarchical Bayesian model gives the result:


             mean         2.5%         25%        50%         75%     97.5%

int1[2] 1.0708560 -0.8553867 0.42721045 1.0884049 1.71440904 2.9825777
int1[3] 0.1837309 -0.4267303 -0.02415694 0.1745152 0.38386824 0.8209729
int1[4] -0.3906380 -1.2743490 -0.69915571 -0.3829878 -0.08225371 0.5214457
int1[5] 0.1314058 -0.5440587 -0.09298982 0.1375942 0.36474282 0.7928960

int1[2],int1[3],int1[4],int1[5] represent the effect of Shom, Shet,Acon and Ahet, respectively.


So three methods gave a quite different result, which one should I believe?
Actually, most previous studies and studies in my study area based on the same dataset, have found neighborhood crowing can indeed influence the seedling survival. Also, the result of GLMM seemed more reasonable.

I’m not sure whether I make it clear. Thank you for your time!

The stan model you present here is not comparable to your glmer and stan_glmer models. In the stan model, you considered inter-specific variation of neighbor effects, and do regression to find out the relationship between neighbor effects and species level traits. Your stan model is more complicated than your glmer model!

Try this:
glmer(survival ~ (Shom + Shet + Acon + Ahet)*trait + (Shom + Shet + Acon + Ahet|species) + (1|plot) + (1|census) , family = binomial, data= tree.dat)

I suppose this is the corresponding glmer form to your stan model if I didn’t misunderstand your model.

Best wishes!

ZY SYSU