Unpooled Non-Linear Regression in Stan

R has an inbuilt data set called Orange, which measures the circumference of 5 orange trees (in mm) at various ages (measured in days since 1968). I’m trying to use Stan to fit a nonlinear model separately to each tree (in other words, a program that performs an unpooled analysis).

My model is as follows:

Circumference at time t is c(t) = \dfrac{\beta_1}{1 + \exp(-(t - \beta_2) / \beta_3)}, where all parameters are positive.

\beta_1 is the circumference at maturity (when t is large), \beta_2 is the time point at which half the final circumference is reached (i.e., if t = \beta_2, then c(t) = \dfrac{\beta_1}{2}), and \beta_3 measures the rate of growth of the tree.

I’ve never fit such a model in Stan before, so I’m lost as to how to go about this. I would appreciate it if people could please take the time to demonstrate this.

I doubt you really want to do an unpooled analysis, and these models are generally difficult to fit, so you probably want to use stan_nlmer in rstanarm. But basically it is

data {
  int<lower=1> N;
  vector<lower=0> t;
  vector<lower=0> y;
}
parameters {
  vector<lower=0>[3] beta;
  real<lower=0> sigma;
}
model {
  vector[N] mu = beta[1] * inv(1 + exp(-(t - beta[2]) / beta[3]));
  target += normal_lpdf(y | mu, sigma);
  // priors
}

I’ve been tweaking this for a couple of hours, but I still can’t get it to work. I’m not sure if I’m doing this correctly:

```{stan output.var="non_pooled"}
data {
  int<lower=0> n;
  vector<lower=0>[n] t;
  vector<lower=0>[n] y;
}
parameters {
  vector<lower=0>[3] beta;
  real<lower=0> sigma;
}
model {
  vector[n] mu = beta[1] * inv(1 + exp(-(t - beta[2]) / beta[3]));
  target += normal_lpdf(y | mu, sigma);
  
  sigma ~ Cauchy(0, 5);
}
```

What is wrong with it?

All distribution names are all lowercase so you need cauchy.

1 Like

Thanks, that fixed it. My problem now is that every time I try

```{r}
print(model.fit2, digits=5, pars=c("mu", "beta", "sigma"))
```

RStudio crashes.

Sampling code is as follows:

```{r, results='hide'}
data.in <- list(t = Orange$age, n = length(Orange$circumference), y = Orange$circumference)
model.fit2 <- sampling(non_pooled, data=data.in)
```

Which runs, but

What’s going wrong here? My goal was to practice hierarchical analysis, but I can’t even get the non-pooling to work.

@bgoodri

I also tried to use a differently formulated model, yet it still doesn’t work!

```{stan output.var="non_pooled2"}
data {
  int<lower=0> N; 
  int<lower=1,upper=5> tree[N];
  vector[N] t;
  vector[N] y;
} 
parameters {
  vector<lower=0>[3] beta;
  real<lower=0> sigma;
} 
transformed parameters {
  vector[N] mu;

  for (i in 1:N)
    mu[i] = beta[1] * inv(1 + exp(-(t[i] - beta[2]) / beta[3]));
}
model {
  y ~ normal(mu, sigma);
  sigma ~ cauchy(0, 5);
}
``` 

```{r, results='hide'}
data.in <- list(t = Orange$age, N = length(Orange$circumference), y = Orange$circumference, tree = as.integer(Orange$Tree))
model.fit2 <- sampling(non_pooled2, data=data.in)
```

This part works, but we get

And here is where it (RStudio) crashes (again!):

```{r}
print(model.fit2, digits=5, pars=c("mu", "beta", "sigma"))
```

I said before such models are difficult to fit. But printing shouldn’t cause RStudio to crash.

Hmm, interesting. So It’s not that either of the unpooled models I’ve posted are “incorrect” somehow? Rather, they’re just hard to fit?

Yes, but your bigger problem is crashing RStudio just with printing.

Ok, I managed to solve the problem. The print statement was causing RStudio to crash precisely because the Stan code/model was incorrect, which was obviously leading to something undesirable happening in the background. As to why the incorrect model was causing RStudio crash specifically when we attempted to print its output, I have no idea. This question is in the domain of the developers.

My code is as follows:

```{stan output.var="non_pooled"}
data {
  int<lower=0> n;
  vector<lower=0>[n] t;
  int<lower=0> groups;
  matrix<lower=0>[7, 5] y;
}
parameters {
  matrix<lower=0>[3, groups] beta;
  real<lower=0> sigma;
}
transformed parameters {
  matrix[7, groups] mu;
  for (j in 1:groups) {
    for (i in 1:7) {
      mu[i, j] = beta[1, j] * inv(1 + exp(-(t[i] - beta[2, j]) / beta[3, j]));
    }
  }
}
model {
  for (j in 1:groups) {
    for (i in 1:7) {
      y[i, j] ~ normal(mu[i, j], sigma);
    }
    
    beta[1, j] ~ normal(170, 5);
    beta[2, j] ~ normal(85, 20);
    beta[3, j] ~ normal(500, 500);
    
  }
  
  sigma ~ student_t(2, 0, 1);
}
```

```{r}
data.in <- list(t = Orange$age, n = length(Orange$circumference), y = matrix(Orange$circumference, 7, 5), group = as.integer(Orange$Tree), groups <- length(unique(Orange$Tree)))
model.fit2 <- sampling(non_pooled, data=data.in)
```

```{r}
print(model.fit2, digits=4, pars=c("mu", "beta", "sigma"))
```

Explanation

We have groups = 5 groups and n = 35 observations within those groups.

I created the 7 \times 5 matrix y, where the columns denote the groups and the rows denote the observations within those groups. Therefore, we have 7 \times 5 = 35 total observations, as required.

Since this is an unpooled model, we need to generate parameters \beta_1^j, \beta_2^j, \beta_3^j for each group j = 1, \dots, 5. Therefore, beta is a 3 \times 5 matrix.

This is a part that I’m still unsure of:

mu is a 7 \times 5 matrix, where the columns denote the group and the rows denote the observations. This is because each group has 7 observations, and so there are 7 observations within each group. However, I’m still concerned about this part, since I can’t make sense of having 7 different means mu for each group. In an unpooled model, isn’t each group supposed to have their own mean, not 7, as we have here?

EDIT: I just noticed that I had accidentally also copy and pasted an old version of the model along with the new version. I have now deleted the old version. My apologies for any confusion.

I am pretty sure that is impossible. The print code is unrelated to the Stan program.

You still have multiple time points, so the conditional mean varies over time but does so differently across the three groups.

1 Like

Hmm, my suspicion was that, since the model was incorrect, in the sense that it wasn’t producing the output that we intended it to, the stanfit object was produced in such a way that would cause RStudio to crash when we attempted to print its values. Perhaps there were too many values produced in the stanfit object for print to handle and consequently it crashed? I’m not sure what else would cause it to crash at the print statement when everything that preceded it (compilation and sampling) worked as normal.

Ahh, that makes sense. Thanks again for the clarification, Ben.