Hierarchical Nonlinear Regression


#1

R has an inbuilt data set called Orange, which measures the circumference of 5 orange trees (in mm) at various ages (measured in days since 1968). I’m using this data set for this example model.

My model is as follows:

Circumference at time t is c(t) = \dfrac{\beta_1}{1 + \exp(-(t - \beta_2) / \beta_3)}, where all parameters are positive.

\beta_1 is the circumference at maturity (when t is large), \beta_2 is the time point at which half the final circumference is reached (i.e., if t = \beta_2, then c(t) = \dfrac{\beta_1}{2}), and \beta_3 measures the rate of growth of the tree.

In this thread, I managed to develop an unpooled version of this nonlinear regression model:

```{stan output.var="non_pooled"}
data {
  int<lower=0> n;
  vector<lower=0>[n] t;
  int<lower=0> groups;
  matrix<lower=0>[7, 5] y;
}
parameters {
  matrix<lower=0>[3, groups] beta;
  real<lower=0> sigma;
}
transformed parameters {
  matrix[7, groups] mu;
  for (j in 1:groups) {
    for (i in 1:7) {
      mu[i, j] = beta[1, j] * inv(1 + exp(-(t[i] - beta[2, j]) / beta[3, j]));
    }
  }
}
model {
  for (j in 1:groups) {
    for (i in 1:7) {
      y[i, j] ~ normal(mu[i, j], sigma);
    }
    
    beta[1, j] ~ normal(170, 5);
    beta[2, j] ~ normal(85, 20);
    beta[3, j] ~ normal(500, 500);
    
  }
  
  sigma ~ student_t(2, 0, 1);
}
```

```{r}
data.in <- list(t = Orange$age, n = length(Orange$circumference), y = matrix(Orange$circumference, 7, 5), group = as.integer(Orange$Tree), groups <- length(unique(Orange$Tree)))
model.fit2 <- sampling(non_pooled, data=data.in)
```

```{r}
print(model.fit2, digits=4, pars=c("mu", "beta", "sigma"))
```

Explanation

We have groups = 5 groups and n = 35 observations within those groups.

I created the 7 \times 5 matrix y, where the columns denote the groups and the rows denote the observations within those groups. Therefore, we have 7 \times 5 = 35 total observations, as required.

Since this is an unpooled model, we need to generate parameters \beta_1^j, \beta_2^j, \beta_3^j for each group j = 1, \dots, 5. Therefore, beta is a 3 \times 5 matrix.

mu is a 7 \times 5 matrix, where the columns denote the group and the rows denote the observations. This is because each group has 7 observations, and so there are 7 observations within each group. This matrix contains the conditional means at various time points across the groups.

Hierarchical Model

I now want to try and go one step further and build a hierarchical nonlinear regression model that puts hierarchical priors onto the three parameters. So if \beta_i^j , i = 1, 2, 3, denotes the three regression parameters for group/tree j, then we would assume that they come from the following populations:

\beta_1^j \sim \mathcal{N}(\mu_{\beta_1}, \sigma^2_{\beta_1}), \ j = 1, \dots, 5
\beta_2^j \sim \mathcal{N}(\mu_{\beta_2}, \sigma^2_{\beta_2}), \ j = 1, \dots, 5
\beta_3^j \sim \mathcal{N}(\mu_{\beta_3}, \sigma^2_{\beta_3}), \ j = 1, \dots, 5

And with appropriate priors on the hyper-parameters \mu_{\beta_i} and \sigma^2_{\beta_i}, \ i = 1, 2, 3

I’ve tried to build my hierarchical model using (1) Andrew Gelman’s textbook Bayesian Data Analysis, Third Edition and (2) Richard McElreath’s textbook Statistical Rethinking:


(from Bayesian Data Analysis, Third Edition, by Andrew Gelman)

And a couple of other Stan examples that I was able to find online.

Despite this, the atrocious n_eff and Rhat statistics of my model seem to suggest that I’ve done this incorrectly.

My code is as follows:

```{stan output.var="hierarchical"}
data {
  int<lower=0> n;
  vector<lower=0>[n] t;
  int<lower=0> groups;
  matrix<lower=0>[7, 5] y;
}
parameters {
  matrix<lower=0>[3, groups] beta;
  vector<lower=0>[3] mu_beta;
  vector<lower=0>[3] sigma_beta;
  real<lower=0> sigma;
}
transformed parameters {
  matrix[7, groups] mu;
  for (j in 1:groups) {
    for (i in 1:7) {
      mu[i, j] = beta[1, j] * inv(1 + exp(-(t[i] - beta[2, j]) / beta[3, j]));
    }
  }
}
model {
  for (j in 1:groups) {
    for (i in 1:7) {
      y[i, j] ~ normal(mu[i, j], sigma);
    }
    beta[1, j] ~ normal(mu_beta[1], sigma_beta[1]);
    beta[2, j] ~ normal(mu_beta[2], sigma_beta[2]);
    beta[3, j] ~ normal(mu_beta[3], sigma_beta[3]);
  }
  mu_beta[3] ~ normal(170, 5);
  mu_beta[2] ~ normal(85, 20);
  mu_beta[1] ~ normal(500, 500);
  sigma_beta ~ student_t(2, 0, 1);
  sigma ~ student_t(2, 0, 1);
}
```

```{r}
data.in <- list(t = Orange$age, n = length(Orange$circumference), y = matrix(Orange$circumference, 7, 5), group = as.integer(Orange$Tree), groups <- length(unique(Orange$Tree)))
model.fit3 <- sampling(hierarchical, data=data.in)
```

```{r}
print(model.fit3, digits=4, pars=c("mu", "beta", "sigma", "mu_beta"))
```

As can be seen, I attempted to adapt my unpooled model to the hierarchical model.

I would greatly appreciate it if people could please take the time to show the corrected code, with perhaps, if they’d be so kind, some accompanying explanation of what the errors were, so that I may have a clear understanding of how to do this myself in the future.


#2

You could try to let the brms R package generate the Stan code (and fit the model) for you. If that ends up working out for your data, it may help you finding problems in your own Stan model. For more details about brms’ non-linear syntax type vignette("brms_nonlinear") in R.


#3
t <- Orange$age
y <- Orange$circumference
dat1 <- data.frame(t, y)

prior1 <- prior(normal(500, 500), nlpar = "mu_beta1")
prior2 <- prior(normal(85, 20), nlpar = "mu_beta2")
prior3 <- prior(normal(170, 5), nlpar = "mu_beta3")
prior4 <- prior(student_t(2, 0, 1), nlpar = "sigma_beta1")
prior5 <- prior(student_t(2, 0, 1), nlpar = "sigma_beta2")
prior6 <- prior(student_t(2, 0, 1), nlpar = "sigma_beta3")
prior7 <- prior(student_t(2, 0, 1), nlpar = "sigma")
prior8 <- prior(normal(mu_beta1, sigma_beta1), nlpar = "beta1")
prior9 <- prior(normal(mu_beta2, sigma_beta2), nlpar = "beta2")
prior10 <- prior(normal(mu_beta3, sigma_beta3), nlpar = "beta3")
fit1 <- brm(bf(y ~ normal(beta1 * inv(1 + exp(-(t - beta2) / beta3)), sigma), beta1 + beta2 + beta3 + sigma + t ~ 1, nl = TRUE), data = dat1, prior = c(prior1, prior2, prior3, prior4, prior5, prior6, prior7, prior8, prior9, prior10))
```

Error: The following variables are missing in 'data': 'sigma'

Any idea why this isn’t working?

Anyway, it’s not the values themselves that I’m interested, so this doesn’t really address my problem; rather, I’m interested in seeing what the code for such a hierarchical model is.


#4

I’m at a total loss as to why this isn’t working. Based on my research, converting a non-pooled model to a hierarchical model should just be a matter of defining priors for the hyper-parameters in the non-pooled model, which, I think, is what I did. These lecture slides by Daniel Lee from Columbia University give an example that does exactly as I said:

EDIT: Thinking about this further, It could be that my model is generally correct and, instead, it is my choice of priors for the \mu_{\beta} and/or \sigma^2_{\beta} that is causing problems?


#5

Can you post the whole model, data, and how you ran it to give you the last error?

That doesn’t look like it should come from the original model posted.


#6

Oh wow, the man himself.

That error was from my attempt at @paul.buerkner’s suggestion – not from the original (hierarchical) model in question. If you are specifically referring to Paul’s suggestion, then all of the code that I ran to get that error is precisely what I posted above:

t <- Orange$age
y <- Orange$circumference
dat1 <- data.frame(t, y)

prior1 <- prior(normal(500, 500), nlpar = "mu_beta1")
prior2 <- prior(normal(85, 20), nlpar = "mu_beta2")
prior3 <- prior(normal(170, 5), nlpar = "mu_beta3")
prior4 <- prior(student_t(2, 0, 1), nlpar = "sigma_beta1")
prior5 <- prior(student_t(2, 0, 1), nlpar = "sigma_beta2")
prior6 <- prior(student_t(2, 0, 1), nlpar = "sigma_beta3")
prior7 <- prior(student_t(2, 0, 1), nlpar = "sigma")
prior8 <- prior(normal(mu_beta1, sigma_beta1), nlpar = "beta1")
prior9 <- prior(normal(mu_beta2, sigma_beta2), nlpar = "beta2")
prior10 <- prior(normal(mu_beta3, sigma_beta3), nlpar = "beta3")
fit1 <- brm(bf(y ~ normal(beta1 * inv(1 + exp(-(t - beta2) / beta3)), sigma), beta1 + beta2 + beta3 + sigma + t ~ 1, nl = TRUE), data = dat1, prior = c(prior1, prior2, prior3, prior4, prior5, prior6, prior7, prior8, prior9, prior10))

As for the data, I’m using R’s inbuilt Orange data set.

However, if you were referring to my mention of your lecture notes, that was within the context of the hierarchical model that I’m trying to code up in Stan. This is what all of my posts, except for the one replying to Paul, are referring to.

My apologies for any confusion.


#7

I don’t know brm well enough to help you, but it does look like the sigma term is in the equation, so maybe it’s expecting that as data?


#8

The model specification in brms is much simpler. You can find all the explanation in vignette("brms_nonlinear"). I think the following could work for you as a multilevel model for you data.

prior1 <- prior(normal(500, 500), nlpar = "beta1")
prior2 <- prior(normal(85, 20), nlpar = "beta2")
prior3 <- prior(normal(170, 5), nlpar = "beta3")
bform <- bf(
   y ~ beta1 * inv(1 + exp(-(t - beta2) / beta3)), 
   beta1 + beta2 + beta3 ~ 1 + (1|group), 
   nl = TRUE
)
fit1 <- brm(bform, data = dat1, prior = c(prior1, prior2, prior3))

This requires group to be part of your data of course.


#9

It’s not brm that I’m concerned with – that was just a suggestion by Paul to assist in answering the main problem. The main problem that I’m concerned with is that which I outlined in the main post – namely, the Stan code for the hierarchical model.

brm will get me the values, but the values alone do not have educational value. I’m more-so interested in the correct Stan hierarchical model.


#10

Thanks for the clarification. But is that a hierarchical analysis? I can’t see any priors on beta1, beta2, beta3.


#11

This prior is automatically part of the model due to the model specification.


#12

But this is just a non-hierarchical model, right? For instance, as per the main post, for the hierarchical model we require the priors

\beta_1^j \sim \mathcal{N}(\mu_{\beta_1}, \sigma^2_{\beta_1}), \ j = 1, \dots, 5
\beta_2^j \sim \mathcal{N}(\mu_{\beta_2}, \sigma^2_{\beta_2}), \ j = 1, \dots, 5
\beta_3^j \sim \mathcal{N}(\mu_{\beta_3}, \sigma^2_{\beta_3}), \ j = 1, \dots, 5

And with appropriate priors on the hyper-parameters \mu_{\beta_i} and \sigma^2_{\beta_i}, \ i = 1, 2, 3.

From my perspective, it seems that the model you specified is simply an unpooled analysis – not a hierarchical analysis? But I already did the unpooled model, and my focus is on the hierarchical model.


#13

It is an hierachical model. ;-)

The parameterization might just not look exactly as you expected (i.e. I use the non-centered parameterization in brms). I suggest you take a closer look at

prior1 <- prior(normal(500, 500), nlpar = "beta1")
prior2 <- prior(normal(85, 20), nlpar = "beta2")
prior3 <- prior(normal(170, 5), nlpar = "beta3")
bform <- bf(
   y ~ beta1 * inv(1 + exp(-(t - beta2) / beta3)), 
   beta1 + beta2 + beta3 ~ 1 + (1|group), 
   nl = TRUE
)
make_stancode(bform, data = dat1, prior = c(prior1, prior2, prior3))

I don’t want to force you to use brms for this I just wanted to point out that it might help you understanding what’s potentially wrong (or improvable) in your original model.


#14

Using make_stancode(bform, data = dat1, prior = c(prior1, prior2, prior3)) , the Stan code is as follows:

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K_beta1;  // number of population-level effects 
  matrix[N, K_beta1] X_beta1;  // population-level design matrix 
  int<lower=1> K_beta2;  // number of population-level effects 
  matrix[N, K_beta2] X_beta2;  // population-level design matrix 
  int<lower=1> K_beta3;  // number of population-level effects 
  matrix[N, K_beta3] X_beta3;  // population-level design matrix 
  // covariate vectors 
  vector[N] C_1;
  // data for group-level effects of ID 1
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_beta1_1;
  // data for group-level effects of ID 2
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_beta2_1;
  // data for group-level effects of ID 3
  int<lower=1> J_3[N];
  int<lower=1> N_3;
  int<lower=1> M_3;
  vector[N] Z_3_beta3_1;
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
} 
parameters { 
  vector[K_beta1] b_beta1;  // population-level effects 
  vector[K_beta2] b_beta2;  // population-level effects 
  vector[K_beta3] b_beta3;  // population-level effects 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // unscaled group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // unscaled group-level effects
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  vector[N_3] z_3[M_3];  // unscaled group-level effects
} 
transformed parameters { 
  // group-level effects 
  vector[N_1] r_1_beta1_1 = sd_1[1] * (z_1[1]);
  // group-level effects 
  vector[N_2] r_2_beta2_1 = sd_2[1] * (z_2[1]);
  // group-level effects 
  vector[N_3] r_3_beta3_1 = sd_3[1] * (z_3[1]);
} 
model { 
  vector[N] mu_beta1 = X_beta1 * b_beta1; 
  vector[N] mu_beta2 = X_beta2 * b_beta2; 
  vector[N] mu_beta3 = X_beta3 * b_beta3; 
  vector[N] mu;
  for (n in 1:N) { 
    mu_beta1[n] += r_1_beta1_1[J_1[n]] * Z_1_beta1_1[n];
    mu_beta2[n] += r_2_beta2_1[J_2[n]] * Z_2_beta2_1[n];
    mu_beta3[n] += r_3_beta3_1[J_3[n]] * Z_3_beta3_1[n];
    // compute non-linear predictor 
    mu[n] = mu_beta1[n] * inv(1 + exp( - (C_1[n] - mu_beta2[n]) / mu_beta3[n]));
  } 
  // priors including all constants 
  target += normal_lpdf(b_beta1 | 500, 500); 
  target += normal_lpdf(b_beta2 | 85, 20); 
  target += normal_lpdf(b_beta3 | 170, 5); 
  target += student_t_lpdf(sigma | 3, 0, 57)
    - 1 * student_t_lccdf(0 | 3, 0, 57); 
  target += student_t_lpdf(sd_1 | 3, 0, 57)
    - 1 * student_t_lccdf(0 | 3, 0, 57); 
  target += normal_lpdf(z_1[1] | 0, 1);
  target += student_t_lpdf(sd_2 | 3, 0, 57)
    - 1 * student_t_lccdf(0 | 3, 0, 57); 
  target += normal_lpdf(z_2[1] | 0, 1);
  target += student_t_lpdf(sd_3 | 3, 0, 57)
    - 1 * student_t_lccdf(0 | 3, 0, 57); 
  target += normal_lpdf(z_3[1] | 0, 1);
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma); 
  } 
}

So this is the hierarchical model? How can I discern values like the \mu_{\beta_i} and \sigma^2_{\beta_i}?


#15

The following code works perfectly for me

prior1 <- prior(normal(500, 500), nlpar = "beta1", lb = 0)
prior2 <- prior(normal(85, 20), nlpar = "beta2", lb = 0)
prior3 <- prior(normal(170, 5), nlpar = "beta3", lb = 0)
bform <- bf(
  circumference ~ beta1 * inv(1 + exp(-(age - beta2) / beta3)), 
  beta1 + beta2 + beta3 ~ 1 + (1|Tree), 
  nl = TRUE
)

fit1 <- brm(bform, data = Orange, prior = c(prior1, prior2, prior3))

summary(fit1)
coef(fit1)

conds <- make_conditions(fit1, "Tree")
marginal_effects(fit1, effects = "age:Tree", conditions = conds, re_formula = NULL)

It should provide you with everything you need.


#16

Thanks for the clarification.

I presume the sd(beta_Intercept) are the \sigma^2_{\beta_i}? But what about the \mu_{\beta_i}?


#17

That’s correct. The \mu_{\beta_i} are the beta<i>_Intercept parameters in the summary output.


#18

Shouldn’t there only be 3 values for \mu_{\beta_i}, just as with the \sigma^2_{\beta_i}? Instead, this has a different value for each group.


#19

there are only 3 of these values in the output of summary(), which are the estimates averaged across groups.

coef() on the other hand gives you estimates by group.


#20

Anyone able to correct my code for the hierarchical model and explain what’s wrong with it? It’s been a while since I posted this, but, unfortunately, I still haven’t been able to figure out what’s wrong with it.

I might keep this thread open and have another attempt at a hierarchical model and see how that works out.