Unsure how to / get error when trying to predict tau (standard deviation of random intercepts) in a distributional model in brms

Hi, I’m trying to fit a binomial model with a random intercept for item type, and (among other things) model the width of the variance on the random intercept parameter based on the frequency (not dataset-internal) of that item type. What I have is here:

bf_mixed_souped <- bf(NumSuccesses|trials(OverallFreq) ~ 
                 Form +
                 Percept +
                 Culture +
                 Power + 
                 Icon +
                 Len +
                 Lapse +
                 (1|binomial_id), nl = TRUE)+
  lf(tau ~ Freq)

prior_souped <- prior(normal(0, 3), nlpar = "tau")

b_model_mixed_souped <- brm(formula = bf_mixed_souped,
                     family = binomial,
                     data = a,
                     prior = prior_souped,
                     warmup = 1000, 
                     iter = 2000,
                     cores = 4, 
                     chains = 4,
                     save_model = "./Models/mixed_souped_stan",
                     file = "./Models/mixed_souped"

but I get the warning

Error: The parameter 'tau' is not a valid distributional or non-linear parameter. Did you forget to set 'nl = TRUE'?

I’m trying to base my approach on this example in the brms manual, on the bottom of p. 48 (link here: https://cran.r-project.org/web/packages/brms/brms.pdf):

# model sigma as a function of the mean
bf(y ~ eta, nl = TRUE) +
lf(eta ~ 1 + x) +
nlf(sigma ~ tau * sqrt(eta)) +
lf(tau ~ 1)

You can think of what I’m trying to do as a version of the example of the distributional model here (Estimating Distributional Models with brms) where the standard deviation of the response group was predicted by some categorical variable - there, they’re calling it sigma, so I assumed based on the brms manual example, there would be an analogous tau available.

Could anyone help me out here? I really appreciate your help in advance!


I figure maybe @paul.buerkner or @andrjohns or @Solomon or @martinmodrak might know? (sorry to spam your @'s like this, not sure if this is maybe too niche for the forums :/ )

My experience with non-linear models like this is limited, so take my comments with healthy skepticism. Notice that in Paul’s example, he first defined tau as a parameter for the sigma submodel. You haven’t done anything like that in your code. You need to let brm() know what tau is.

1 Like

Hi Solomon, thanks for your reply. I think you’re right - I had hoped tau would be an accessible keyword like sigma, but it does just follow from the other parts of the bf() call. What I was hoping for was some way to access the variance of the random intercept distribution (usually called tau, by convention, if I understand the terminology in the literature correctly), but it doesn’t seem like this is the case.

1 Like

I do not believe that brms can do this. You would need to do it in Stan. Also, as I thought more about what you are trying to do here, I am wondering if it even makes sense…
What you are calling tau is the standard deviation of group-level effects in brms, which would be a single parameter. This is the standard deviation of the varying intercepts of a theoretical population of groups. If I understand correctly, you are wanting to be able to predict this parameter also based on some, say, continuous predictor variable. But what this implies is a vector of tau’s for every group, not a single tau, because for every (infinite) value of the predictor, then you would have a different value for tau. Unless multiple groups had the exact same value for tau… I tried to think about this by writing a simulation in R and model in Stan. Here, your tau’s are called sd_1 in the model, and they are a vector of parameters that are estimated as the outcome variable in an exponential model predicted by the predictor v. They are also used in the varying intercepts for the binomial model using a non-centered parameterization (the default in brms and usually works better unless you have a ton of data in each group).
The model runs fine and appears to work, although b_sd is underestimated (perhaps due to the partial pooling effect from the binomial model??). I think there is still partial pooling going on, as the sd_1 still are part of the binomial model. But now you get a different sd_1, or tau as you call it, for every id, rather than a singe one for the entire group.
Honestly, even after writing this thing up, I am not entirely sure what to think about it from a conceptual or theoretical view…so take this with a grain of salt. Maybe @jsocolar would have some thoughts.


#data simulation
n_id <- 100	#number of id's
n_len <- 10	#number of observations for each id
n <- n_id * n_len	#total number of observations
id <- rep(1:n_id, each=n_len)	#id label
trials <- rpois(n, lambda=50)	#random number of trials
x <- rnorm(n)	#predictor for binomial outcome model
v <- rnorm(n_id)	#predictor for vector of sd's for varying intercepts
a_sd <- 0	#intercept for exponential model of sd's
b_sd <- -1.5	#coefficient for v
mu_sd_z <- a_sd + b_sd*v	#linear model for sd's
sd_z <- rexp(n_id, rate = exp(-(mu_sd_z)))	#generative model for the vector of sd's of varying intercepts
z <- rnorm(n_id, 0, sd_z)	#generate varying offsets (varying intercepts) for the binomial outcome model
a <- (-2)	#intercept of the binomial outcome model
b <- 1	#coeffient for x
p <- plogis((a + z[id]) + b*x)	#probability p for linear model
y <- rbinom(n=n, size=trials, prob=p)	#generative model for outcome y's

#stan data
d1 <- list(N = n, #total number of observations
           Y = y, #binary response
	   x = x, #predictor for binomial outcome y
	   v = v, #predictor for vector of sd's of varying intercepts
           J = n_id, #number of id's
           id = id, #id
	   trials = trials #trials

#stan code
stan_code1 <- "
data {
  int<lower=1> N;  
  int Y[N];  
  vector[N] x;
  int<lower=1> J; 
  int<lower=1> id[N];  
  vector[J] v;
  int trials[N];  
parameters {
  real a;  // intercept for binomial outcome model
  vector<lower=0>[J] sd_1;  // vector of group-level standard deviations for varying intercepts
  vector[J] z_1;  // standardized group-level effect, needed for non-centered parameterization
  real b; // coefficient for x for binomial outcome model
  real a_sd; // intercept for exponential model for vector of sd's of varying intercepts
  real b_sd; // coeffient for v for exponential model for vector of sd's of varying intercepts
transformed parameters {
  vector[J] r_1_1;  // actual varying intercepts (offsets)
  r_1_1 = (sd_1 .* z_1);  // non-centered parameterization
model {
    for (j in 1:J) {
      target += exponential_lpdf(sd_1[j] | exp(-(a_sd + b_sd * v[j]) )); // exponential model for vector of sd's for varying intercepts
    for (n in 1:N) {
      target += binomial_logit_lpmf(Y[n] | trials[n], a + r_1_1[id[n]] + b * x[n]);  // binomial outcome model
  // priors
  target += normal_lpdf(a | 0, 2.5);
  target += normal_lpdf(sd_1 | 0, 2.5)
    - 1 * normal_lccdf(0 | 0, 2.5);
  target += std_normal_lpdf(z_1);
  target += normal_lpdf(b | 0, 2.5);
  target += normal_lpdf(a_sd | 0, 2.5);
  target += normal_lpdf(b_sd | 0, 2.5);
generated quantities {

#fit model
fit1 <- stan(model_code = stan_code1, data = d1,
             chains = 4, iter = 2000, warmup = 1000, 
             thin = 1, cores = 4)

#print selected parameters
print(fit1, pars=c("a", "b", "a_sd", "b_sd"))

@jd_c , just FWIW there’s no reason in principle why we cannot estimate a random effect standard deviation that varies with covariates that are properties of the random effect groups. One way to think about random intercept models is as a latent Gaussian linear regression for the group means, plus a complicated response distribution that computes the likelihood for each group. In this sense, much as brms allows us to fit distributional regression with covariates on the residual sigma, we are estimating a latent distributional regression here for the group means.

@Canaan_Breiss I don’t think this is possible to do in basic brms, though it is possible to do within brms using the functionality to insert custom stan code into the model itself. It might be possible to get close with pure brms though, if you have a strong a priori expectation for how the random effect variance should scale with group characteristics. In particular, you can specify random effect terms with predetermined variance-covariance structures. So if you want to just supply the variance-covariance structure for the random intercept directly, that is easier to do without writing any Stan code. See here: Estimating Phylogenetic Multilevel Models with brms

1 Like

Thanks. Yeah that’s why I wrote up the Stan model, which appears to work, because that’s what @Canaan_Breiss was asking for (something similar to distributional model in brms). It just seems kinda weird, so I wanted someone else’s input. I hadn’t seen anything like that before (but I also haven’t exactly searched for it either).

1 Like