Priors on predicted shape parameter(s) in mixed zinb model

  • Operating System: Windows 10, Intel Xeon core
  • brms Version: 2.10.0

Hello: this is my first time modeling with Stan and brms. This forum - the posts and answers within, especially @paul.buerkner - have been so helpful and are the reason I’ve made it this far. So thank you in advance!

I am attempting a zero-inflated negative binomial mixed-effect model, and I am also interested in predicting the shape (dispersion) parameter (so as to assess whether variance is different across years in my model).

I am predicting a count response: cars95, annual reproductive success
Predictors include:
Main effects:
age (discrete)
year (discrete)
mort (0/1)
bwev (0/1)
prev sire (0/1)
age x year interaction
Group-level effects:
age|id (where id is categorical for individual across years)

I am also interested in how the variance of my response (cars95) might vary across years, so thought to also model the dispersion parameter associated with a negative binomial distribution: shape ~ year

Thus, the model I would like to fit is:

model {
cars95 ~ age*year + mort + bwev + prevsire + (age|id), shape ~ year

However, I cannot set a prior on shape to constrain it as I can in a model where I do not predict shape:

model {
cars95 ~ age*year + mort + bwev + prevsire + (age|id)

When I try to constrain shape in the model where I predict shape, I am thrown the error:

Error: The following priors do not correspond to any model parameter: 
shape ~ gamma(1,1)

When I set no priors and accept defaults, this model fails.

The priors on a well converged initial model where shape is not predicted include:

                  prior     class      coef group resp dpar nlpar bound
1                               b                                      
2                               b       age                            
3                               b  age:year                            
4                               b     bwev1                            
5                               b     mort1                            
6                               b prevsire1                            
7                               b      year                            
8  student_t(3, -2, 10) Intercept                                      
9  lkj_corr_cholesky(1)         L                                      
10                              L              id                      
11  student_t(3, 0, 10)        sd                                      
12                             sd              id                      
13                             sd       age    id                      
14                             sd Intercept    id                      
15           gamma(1,1)     shape                                      
16           beta(1, 1)        zi  

The default priors on the non-converged model where shape is predicted include:

                  prior     class      coef group resp  dpar nlpar bound
1                               b                                       
2                               b       age                             
3                               b  age:year                             
4                               b     bwev1                             
5                               b     mort1                             
6                               b prevsire1                             
7                               b      year                             
8  student_t(3, -2, 10) Intercept                                       
9   student_t(3, 0, 10) Intercept                      shape            
10 lkj_corr_cholesky(1)         L                                       
11                              L              id                       
12  student_t(3, 0, 10)        sd                                       
13  student_t(3, 0, 10)        sd                      shape            
14                             sd              id                       
15                             sd       age    id                       
16                             sd Intercept    id                       
17                             sd            year      shape            
18                             sd Intercept  year      shape            
19           beta(1, 1)        zi        

As I mentioned above, the initial model where I do not predict shape converges well:

Family: zero_inflated_negbinomial 
  Links: mu = log; shape = identity; zi = identity 
Formula: cars95 ~ age * year + mort + bwev + prevsire + (age | id) 
   Data: c95data (Number of observations: 220) 
Samples: 4 chains, each with iter = 15000; warmup = 5000; thin = 2;
         total post-warmup samples = 20000

Group-Level Effects: 
~id (Number of levels: 60) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          1.82      0.97     0.15     3.86 1.00     4798     9806
sd(age)                0.54      0.22     0.14     1.02 1.00     4365     6019
cor(Intercept,age)    -0.63      0.37    -0.96     0.52 1.00     6200     6898

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -4.04      1.77    -7.93    -0.98 1.00     7970    13616
age           1.44      0.61     0.33     2.72 1.00     7962    14286
year          0.10      0.23    -0.35     0.58 1.00    14321    16159
mort1         0.58      0.90    -1.21     2.36 1.00    16313    17632
bwev1        -0.97      0.89    -2.81     0.70 1.00    16957    16902
prevsire1     0.07      0.50    -0.91     1.05 1.00     8872    14221
age:year     -0.15      0.08    -0.30    -0.00 1.00     8792    14912

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     1.68      0.95     0.54     4.14 1.00     9684    14440
zi        0.12      0.09     0.01     0.34 1.00    14419    16585

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

I’ve conducted loo tests and a number of post predictive checks on this initial model. Many, but not all, check out. Problems that arise with the initial model are a higher p_loo count than the number of parameters in my model, and a somehwat poor-performing ppc_dens_overlay plot:

Loo output:

Computed from 20000 by 220 log-likelihood matrix

         Estimate   SE
elpd_loo   -172.6 18.5
p_loo        34.2  6.5
looic       345.1 37.1
------
Monte Carlo SE of elpd_loo is 0.4.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     197   89.5%   13        
 (0.5, 0.7]   (ok)        23   10.5%   121       
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

Post-predictive checks:
PPC - bars:


PPC - ecdf overlay:

PPC- stat, mean

PPC - stat, mean and SD

PPC - ppc_check()

I have multiple questions.

  1. Why can I not assign a prior on a shape parameter in the model where I try to predict shape?
  2. What priors should I assign in the model where I try to predict shape by year to improve convergence? – and how should I specify these? (i.e., using ‘class=’ or ‘dpar=’ etc.) Note, I’m assuming I need stronger priors on the family specific parameters, but I don’t know what or how (i.e., the parameters where dpar=shape)
  3. In my initial model where I don’t predict shape, why is the p_loo from the loo output so high when the model appears to fit decently by the ppc’s?
  4. You’ll notice that my empirical response data does not exceed 8; is it problematic that the zero-inflated negative binomial distribution allows (occasional) responses that are unrealistically high for empirical data? If so, suggested alternatives or work-arounds?

I’ve included a reproducible example together with a link to data:

library(dplyr)
library(brms)
library(ggeffects)
library(GGally)
library(rstantools)
library(rstanarm)
library(loo)
library(bayesplot)

c95data <- read.csv("ARS95KAc.csv")
c95data$id <- as.factor(c95data$id)
c95data$year <- as.integer(c95data$year) #year (2-8) as integer NOT categorical
c95data$mort <- as.factor(c95data$mort)
c95data$prevsire <- as.factor(c95data$prevsire)
c95data$bwev <- as.factor(c95data$bwev)

#MODELS
mod1 <- brms(cars95 ~ age*year + mort + bwev + prevsire + (age|id) , 
     data=c95data,
     family=zero_inflated_negbinomial(), 
     prior= set_prior("gamma(1,1)", class="shape"),
     iter=20000, warmup=5000, thin=2, chains=4,
     control = list(adapt_delta = 0.99), 
     cores=6)

#THROWS ERROR ABOUT PRIOR
mod2 <- brms(bf(cars95 ~ age*year + mort + bwev + prevsire + (age|id) , 
     shape ~ year), 
     data=c95data,
     family=zero_inflated_negbinomial(), 
     prior= set_prior("gamma(1,1)", class="shape"),
     iter=20000, warmup=5000, thin=2, chains=4,
     control = list(adapt_delta = 0.99), 
     cores=6)

##DOES NOT CONVERGE
mod2_nopri <- brms(bf(cars95 ~ age*year + mort + bwev + prevsire + (age|id) , 
     shape ~ year), 
     data=c95data,
     family=zero_inflated_negbinomial(), 
     iter=20000, warmup=5000, thin=2, chains=4,
     control = list(adapt_delta = 0.99), 
     cores=6)

#LOO
loo1 <- loo(mod1, reloo=T, cores=6)

##PPCs
#post predictive checks
yrep <- posterior_predict(mod1)
y <- c95data$cars95
bayesplot::ppc_stat(y, yrep[1:50, ], stat="mean")
brms::pp_check(mod1)
brms::pp_check(mod1, type="stat_2d", stat=c("mean", "sd"), nsamples=30)
brms::pp_check(mod1, type="ecdf_overlay", discrete=T, nsamples=30)
brms::pp_check(mod1, type="bars", nsamples=30)

DATA: https://drive.google.com/file/d/1T51Cp0m8qHtsf2fZcpqchMQJo6dIdQn1/view?usp=sharing

Thank you (again) in advance!
El

Hi, since I should be doing my own work I’m going to comment strictly on setting priors on your shape estimating.

Since we are all busy, before you ask a specific question it’s good to check available vignettes and also search the forum for similar questions.

Check out the vignettes on setting priors, modeling distributional models and how families are parameterized. And the get_prior help page is useful too.

Now: if you use get_prior you can see how to specify your priors for any model you write. In your model things that have dpar will be those priors and you specify them based on their class and dpar.

If you run get_prior and are confused which parts are the dispersion you are looking to predict, you can put JUST the shape part to start. eg.

get_prior(bf (cars95 ~ 1) , shape ~ year )

Your output of priors on the model that predicts shape with default values contains the specification of the prior you want.

There is a prior for class=sd and Intercept (although my R has this as b not sd but I"m going to ignore that). you’d specify them like

prior = c(
          prior(student_t(3,0,1), class = Intercept, dpar = shape),
          prior(gamma(2.3, 0.3), class = sd, dpar = shape))

When you aren’t predicting the shape, the default gamma(1,1) allows basically nearly infinite density at 0 and almost no weight on any other points, but does not exclude the possibility that the value could be nearly infinitely large. It is good practice to plot some density distributions for the priors you are using to see if they reflect plausible values. In this case, if you know that your shape value is small (I have some similar data, and the shape is close to 2 or 3, it has been as high as 14 in some glmer estimates but never even close to 30) so I would want to keep the weight closer to reasonable values if you are not using the model to predict the shape.

The default priors on the intercept and sd for predicting shape also allow quite long tails. If you have any knowledge about what the shape might be, inspect possible prior distributions that will allow the shape still be wide but at least within the real of the realistic (if you know some year has much higher then you might want to make sure the SD value is large enough to allow them to vary within year.)

Also you don’t specify a prior for any of your other predictors or the Intercept. Read some info about selecting weakly informative priors and they will help your model converge, do it more quickly.

An alternative is also to truncate your response, but if you can avoid that by choosing priors correctly try then you won’t have to justify truncating and what value you chose. I think your data maybe could be better served if you had information about how many attempts were made to achieve the counted number of successes. But I already thought more than necessary about your problem. :)

I always suggest people build a model slowly so they can see the impact on sampling as they add terms and change priors. For instance I almost always just start with a glm y~1 to see what’s up.

And after a year of doing models in BRMS I have decided to just always check get_prior() whenever I add or remove any terms to a model because half the time I mess something up and I get thrown an error and have to get prior anyway.

Hope that helps. I should be working on my own model now! oh well
cheers,
meg

2 Likes

Hi @MegBallard : I certainly appreciate you taking time amidst your own work!

If you have additional vignettes on setting priors, I’d be interested! I’ve certainly perused what I think are the more popular/readily available links (e.g., https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations and as many similar questions as I can find - on this forum and elsewhere!). I’ve struggled to apply the fairly generalized reading to my specific system, but would really like some assistance in selecting weakly informative priors - both on shape parameters as well as intercepts and beta terms.

When I try your suggestion:

prior = c(
          prior(student_t(3,0,1), class = Intercept, dpar = shape),
          prior(gamma(2.3, 0.3), class = sd, dpar = shape))

I’m thrown the error:

Error: The following priors do not correspond to any model parameter: 
sd_shape ~ gamma(2.3, 0.3)
Function 'get_prior' might be helpful to you.

I wonder if you set this successfully?

Re plausible shape values, I’m fuzzy on what the shape parameter (and especially its sd and Intercept) can take on as I don’t understand how they relate to my response distribution (that I know will not plausibly exceed values greater than 20).

Apologies if it seems I haven’t done my due diligence! I’m just feeling lost.
I really appreciate your thoughts and encouragement to keep playing around with priors.
El

Can you post the result of running get_prior() using the formula, data and family arguments that you are passing to brms()? For example, I ran the following and indeed there is no class of sd with a dpar = shape.

c95data <- read.csv("c:/users/fjcc/Documents/R/Play/ARS95KAc.csv")
c95data$id <- as.factor(c95data$id)
c95data$year <- as.integer(c95data$year) #year (2-8) as integer NOT categorical
c95data$mort <- as.factor(c95data$mort)
c95data$prevsire <- as.factor(c95data$prevsire)
c95data$bwev <- as.factor(c95data$bwev)
library(brms)
get_prior(bf(cars95 ~ age*year + mort + bwev + prevsire + (age|id) , 
             shape ~ year), 
          data=c95data,
          family=zero_inflated_negbinomial())
#>                   prior     class      coef group resp  dpar nlpar bound
#> 1                               b                                       
#> 2                               b       age                             
#> 3                               b  age:year                             
#> 4                               b     bwev1                             
#> 5                               b     mort1                             
#> 6                               b prevsire1                             
#> 7                               b      year                             
#> 8                lkj(1)       cor                                       
#> 9                             cor              id                       
#> 10 student_t(3, -2, 10) Intercept                                       
#> 11  student_t(3, 0, 10)        sd                                       
#> 12                             sd              id                       
#> 13                             sd       age    id                       
#> 14                             sd Intercept    id                       
#> 15           beta(1, 1)        zi                                       
#> 16                              b                      shape            
#> 17                              b      year            shape            
#> 18  student_t(3, 0, 10) Intercept                      shape

Created on 2020-01-06 by the reprex package (v0.3.0)

Sure @FJCC!
I did catch an error in my get_prior() list - it was for a model with shape predicted by a random year term (discussed below).
For

fit10p.nb <- brm(bf(cars95 ~ age*year + mort + bwev + prevsire + (age|id), shape ~ year),  data=c95data,
               family=zero_inflated_negbinomial(),  prior= prior,
               iter=20000, warmup=5000, thin=2, chains=4,
               control = list(adapt_delta = 0.99), cores=6) 

the get_prior() list looks like what I’m guessing you and @MegBallard observed:

                  prior     class      coef group resp  dpar nlpar bound
1                               b                                       
2                               b       age                             
3                               b  age:year                             
4                               b     bwev1                             
5                               b     mort1                             
6                               b prevsire1                             
7                               b      year                             
8                lkj(1)       cor                                       
9                             cor              id                       
10 student_t(3, -2, 10) Intercept                                       
11  student_t(3, 0, 10)        sd                                       
12                             sd              id                       
13                             sd       age    id                       
14                             sd Intercept    id                       
15           beta(1, 1)        zi                                       
16                              b                      shape            
17                              b      year            shape            
18  student_t(3, 0, 10) Intercept                      shape            

However, this model would not converge with the shape priors suggested by Meg, so I altered the shape priors to:

prior = c(
  prior(student_t(3,0,1), class = Intercept, dpar = shape),
  prior(gamma(1, 1), class = b, dpar = shape))

as gamma(1,1) is closer to what I expect the response distribution to look like for cars95 (fairly tight and pushing up against 0). Note that I don’t know if this is the correct reasoning as I’m still fuzzy on what exactly the shape term does in zinb models - other than it reflects variance in the response - and thus am unsure how to assign and interpret it.
Though this model seemingly converged with low Rhats:

 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = log; zi = identity 
Formula: cars95 ~ age * year + mort + bwev + prevsire + (age | id) 
         shape ~ year
   Data: c95data (Number of observations: 220) 
Samples: 4 chains, each with iter = 20000; warmup = 5000; thin = 2;
         total post-warmup samples = 30000
Group-Level Effects: 
~id (Number of levels: 60) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          1.99      1.00     0.19     4.14 1.01      528      641
sd(age)                0.57      0.22     0.18     1.06 1.01      433      632
cor(Intercept,age)    -0.64      0.35    -0.96     0.44 1.01      603      589

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          -4.16      1.84    -8.28    -0.94 1.00      819     2222
shape_Intercept     0.15      1.31    -1.86     3.01 1.00     3383     4282
age                 1.48      0.62     0.34     2.79 1.00      823     1863
year                0.10      0.24    -0.36     0.60 1.00     1546     2720
mort1               0.62      0.85    -1.06     2.32 1.00     2485     4446
bwev1              -0.97      0.92    -2.85     0.77 1.00     2303     4065
prevsire1           0.02      0.48    -0.92     0.97 1.00     1143     2523
age:year           -0.15      0.08    -0.31    -0.01 1.00      784     1846
shape_year          0.17      0.21     0.00     0.72 1.00     2078     4248

Family Specific Parameters: 
   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
zi     0.15      0.10     0.01     0.37 1.00     1722     1420

there were a very high number of divergent transitions - even with adapt_delta() set to =0.99

Warning messages:
1: There were 28565 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: There were 184 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
3: Examine the pairs() plot to diagnose sampling problems

I am running loo() diagnostics and post-predictive checks now - but I am uneasy about the model.

As I mentioned at the start, the list of get_prior() in my initial post was actually from the model:

cars95 ~ age*year + mort + bwev + prevsire + (age|id), shape ~ (1|year)

                  prior     class      coef group resp  dpar nlpar bound
1                               b                                       
2                               b       age                             
3                               b  age:year                             
4                               b     bwev1                             
5                               b     mort1                             
6                               b prevsire1                             
7                               b      year                             
8                lkj(1)       cor                                       
9                             cor              id                       
10 student_t(3, -2, 10) Intercept                                       
11  student_t(3, 0, 10)        sd                                       
12                             sd              id                       
13                             sd       age    id                       
14                             sd Intercept    id                       
15           beta(1, 1)        zi                                       
16  student_t(3, 0, 10) Intercept                      shape            
17  student_t(3, 0, 10)        sd                      shape            
18                             sd            year      shape            
19                             sd Intercept  year      shape            

And really, I’m not sure whether the shape term ought to be modeled by a fixed or varying year term.
What I want is to allow the variance of my response (cars95) to vary across years - and for my system, I might hypothesize that the variance increases with year.

Thanks again for interest and assistance - I am eager to do what I can to improve!

I can help out with the meaning of the shape parameter, I think, but let me put a couple of caveats up front. I have dabbled in Stan off and on for a few years but I am still a novice and I have just started looking into BRMS. Treat everything I say as the musing of some random guy trying to be helpful.
To understand the role of shape, I would look at the code generated by

make_stancode(bf(cars95 ~ age*year + mort + bwev + prevsire + (age|id), shape ~ year),  data=c95data,
               family=zero_inflated_negbinomial())

This generates scary looking text but I find it very useful in helping me think about what BRMS is doing. If you do not want to see my thought process, skip the following and go down to the paragraph that starts with “So, shape is phi in the underlying Stan function.”

Looking at the model{} section of that code, we can see the likelihood calculation

for (n in 1:N) {
      target += zero_inflated_neg_binomial_log_lpmf(Y[n] | mu[n], shape[n], zi);
    }

About 20 lines above that we can see that shape is calculated as

vector[N] shape = Intercept_shape + Xc_shape * b_shape;

so it is just the result of a linear calculation from an intercept and the b_shape coefficient vector multiplied by what looks like a matrix of data. If you go farther up the model you can see that Xc_shape is a centered version of the year values but I will not get into the details of that.
So what is this zero_inflated_neg_binomial_log_lpmf() function? It is defined near the top of the model code.

real zero_inflated_neg_binomial_log_lpmf(int y, real eta, 
                                           real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         neg_binomial_2_log_lpmf(0 | eta, phi)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             neg_binomial_2_log_lpmf(y | eta, phi); 
    } 
  } 

We don’t have to get into all the details of that. We can see from the likelihood line I quoted above that shape gets passed as the third parameter, which in the function definition is called phi, and phi gets used as the third argument of neg_binomial_2_log_lpmf().

So, shape is phi in the underlying Stan function. Looking in the Stan function reference sections 13.3 and 13.2, we can see that phi controls how much extra variance the distribution has.

Var = \mu + \frac{\mu ^2}{\phi}

where \mu is the mean of the negative binomial distribution. The smaller shape is the higher the variance of the negative binomial distribution. And shape is the result of a linear calculation from intercept_shape and b_shape, so a low intercept_shape will push the variance higher and a small b_shape will mean that variance does not change much from year to year.
I don’t dare give definite advice about your priors. One thought is that you might try raising the sigma on the student_t above one. Maybe after the zero-inflation does its thing, there is not all that much extra variance and a higher intercept would match that. If you believe there may be a strong effect of year on the variance, you would allow higher values of b_shape.

On another aspect of this, if you have divergences, you should indeed be very uneasy about the model. I think the model must be viewed as biased if there are any divergences.

Again, I am no expert, so take all of this as an attempt at helping by a fellow learner.

4 Likes

I forgot to mention in my previous post that shape gets transformed as

shape[n] = exp(shape[n]);

before being used in the likelihood, so that also influences the thinking about the prior.

1 Like

I played with this a bit and found that the following model runs well

prior = c(
  prior(student_t(3,0,10), class = Intercept, dpar = shape),
  prior(student_t(3,0,10), class = b, dpar = shape))
fit10p.nb <- brm(bf(cars95 ~ age*year + mort + bwev + prevsire , shape ~ year),  data=c95data, 
                 family=zero_inflated_negbinomial(),prior= prior,  
                 iter=1000, warmup=500, chains=4,
                 control = list(adapt_delta = 0.8))

The changes I made are: the priors are less informative, I removed the grouping of age on id and I lowered adapt_delta to 0.8. I incidentally reduced to number of iterations to save time. The results were

 Family: zero_inflated_negbinomial 
  Links: mu = log; shape = log; zi = identity 
Formula: cars95 ~ age * year + mort + bwev + prevsire 
         shape ~ year
   Data: c95data (Number of observations: 220) 
Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
         total post-warmup samples = 2000

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          -2.30      1.17    -4.76    -0.13 1.01      871     1237
shape_Intercept     2.89      2.13    -0.29     7.79 1.00     1181      643
age                 0.78      0.45    -0.02     1.76 1.01      838      880
year               -0.00      0.19    -0.37     0.40 1.00     1091     1267
mort1               0.22      0.78    -1.34     1.78 1.00     2048      995
bwev1              -1.09      0.60    -2.40     0.01 1.00     1987     1238
prevsire1           0.61      0.43    -0.26     1.44 1.00     1485     1354
age:year           -0.06      0.06    -0.19     0.07 1.01      875      823
shape_year         -0.56      0.29    -1.22    -0.09 1.00     1296      842

Family Specific Parameters: 
   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
zi     0.16      0.12     0.01     0.43 1.00      909      808

I suppose removing the (age | id) term is the key, though I have not done much more than is shown above. Looking at the data in c95data with the table function in R

table(c95data$id, c95data$cars95)

shows that almost two thirds of the id values appear only with zero counts of cars95 making defining a coefficient for age difficult and not very meaningful.

2 Likes

Gosh, @FJCC, this is all so helpful.

Your background on shape parameter interpretation is something I’ll be referring back to and trying to really understand often!

I’m going to try to organize my additional questions/comments as well as I can.

  1. Many thanks for playing around with my model!
    I believe I need a random intercept on id to control for repeated individuals. cars95 represents annual male reproductive success, and I have measures on some of the same males (represented by id) across years. However, I indeed do not need a random age slope term within this, other than it is very likely in my system that the effect of age may be different for different individuals (ids).
    Tonight, I will play around with just an id random intercept – i.e., (1|id) versus (age|id). Thanks for the suggestion in this avenue!

  2. Regarding the interpretation of the shape parameters - again, taking me through the Stans/BRMS code and boiling it down was SO helpful.
    I think I’ve got it, but I’m wondering if you can confirm? The shape_Intercept term speaks to the variability in my response (cars95) generally across my model - with a large coefficient implying large variance in my response.
    The coefficent on the shape_year term speaks to the changes in the amount of variance in my response across levels of year – with a large, positive term implying large variation in variance across levels of year.

I will continue to play around with model convergence and validation. Before predicting shape, I had a a hard time understanding why I was getting poor loo() validation metrics (eg., too high p_loo values for my number of parameters) but seemed to have post-predictive checks that checked out. fine We’ll see if I encounter those same issues here!

I can’t express my thanks enough. I’m humbled by your generosity with your time and expertise.

First, a bit about confusing terminology. Various distributions have a shape parameter and the one for the negative binomial distribution is represented by \phi in an equation below. Your model also has a variable named shape which is related to but not the same as the \phi used by Stan in its calculations. I explain that relationship farther down. I will try to write shape in italics when referring to the value in your model and use regular text when referring to the generic shape parameter of the negative binomial distribution.

Concerning the shape parameter, keep in mind that in the case of the negative binomial, a larger shape parameter means less variance. The equation is

Var = \mu + \frac{\mu ^2}{\phi}

where \phi is the shape parameter. You are correct that shape_Intercept is the baseline value of shape and shape_year determines how much the value changes as year increases, just that increasing shape means less variance.
You can see what the shape parameter does by running these three lines in R, where the size is the same as \phi in the equation above. When size is small, the distribution tail is very long.

hist(rnbinom(1000, mu = 4, size = .1))
hist(rnbinom(1000, mu = 4, size = 1))
hist(rnbinom(1000, mu = 4, size = 10))

Also, keep in mind that both mu, which gets calculated from the linear combination of

age * year + mort + bwev + prevsire

and shape have a log link in the model. You can see this is the BRMS output on the second line. That means that what gets used to the Stan function is e^\mu and e^{shape}. When shape = 0, the \phi in Stan is 1, when shape = 2 then \phi = 7.3. That is important if you do any manual calculations and in thinking about the priors. The \phi parameter in the negative binomial distribution must not be negative but your parameter shape can be negative because e raised to a negative power is just between zero and one.

2 Likes

Oh I see @FJCC:
A small shape (shape_Intercept) suggests large variance in my response given phi’s placement in the denominator, while a small beta coefficient on shape_year implies small change across levels of year. This is making sense to me now!

Unfortunately, I’m having a hard time getting the model where I add back in a random intercept on id to converge:

prior = c(
  prior(student_t(3,0,10), class = Intercept, dpar = shape),
  prior(student_t(3,0,10), class = b, dpar = shape))
fit10p.nb <- brm(bf(cars95 ~ age*year + mort + bwev + prevsire + (1|id) , shape ~ year),  data=c95data, 
                 family=zero_inflated_negbinomial(),prior= prior,  
                 iter=1000, warmup=500, chains=4,
                 control = list(adapt_delta = 0.8))

I’ll have to continue playing with it, and I guess again attempt tweaking the priors. I’ve currently adjusted the max_treedepth to 15 as a warning message and the Stan manual suggested (Warning messages: 1: There were 30000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded ) but it is taking SO long to run.

Thanks for your continued spot-on assistance @FJCC!

Just wanted to say, that if you need to set max_treedepth to 15, therre is almost certainly something very wrong with your model (often some parameters are not identifiable). If you didn’t manage to debug it it might be worthwhile to start a new thread restating your current problem as this is likely to get more people to look into this (many people ignore new message in older topics as understanding the whole discussion takes quite some time).

Best of luck with your model!

Thanks @martinmodrak - I haven’t debugged the issues in my model leading to a high number of divergent transitions. I suspect better prior assignment will help. I will repost tonight under modeling > specification.

Thanks for the pointer!
El