Unpooled Non-Linear Regression in Stan

y64c · May 30, 2018, 2:28pm

R has an inbuilt data set called Orange, which measures the circumference of 5 orange trees (in mm) at various ages (measured in days since 1968). I’m trying to use Stan to fit a nonlinear model separately to each tree (in other words, a program that performs an unpooled analysis).

My model is as follows:

Circumference at time t is c(t) = \dfrac{\beta_1}{1 + \exp(-(t - \beta_2) / \beta_3)}, where all parameters are positive.

\beta_1 is the circumference at maturity (when t is large), \beta_2 is the time point at which half the final circumference is reached (i.e., if t = \beta_2, then c(t) = \dfrac{\beta_1}{2}), and \beta_3 measures the rate of growth of the tree.

I’ve never fit such a model in Stan before, so I’m lost as to how to go about this. I would appreciate it if people could please take the time to demonstrate this.

bgoodri · May 30, 2018, 3:02pm

I doubt you really want to do an unpooled analysis, and these models are generally difficult to fit, so you probably want to use stan_nlmer in rstanarm. But basically it is

data {
  int<lower=1> N;
  vector<lower=0> t;
  vector<lower=0> y;
}
parameters {
  vector<lower=0>[3] beta;
  real<lower=0> sigma;
}
model {
  vector[N] mu = beta[1] * inv(1 + exp(-(t - beta[2]) / beta[3]));
  target += normal_lpdf(y | mu, sigma);
  // priors
}

y64c · May 30, 2018, 6:28pm

I’ve been tweaking this for a couple of hours, but I still can’t get it to work. I’m not sure if I’m doing this correctly:

```{stan output.var="non_pooled"}
data {
  int<lower=0> n;
  vector<lower=0>[n] t;
  vector<lower=0>[n] y;
}
parameters {
  vector<lower=0>[3] beta;
  real<lower=0> sigma;
}
model {
  vector[n] mu = beta[1] * inv(1 + exp(-(t - beta[2]) / beta[3]));
  target += normal_lpdf(y | mu, sigma);
  
  sigma ~ Cauchy(0, 5);
}
```

bgoodri · May 30, 2018, 6:54pm

What is wrong with it?

y64c · May 30, 2018, 7:23pm

jonah · May 30, 2018, 7:32pm

All distribution names are all lowercase so you need cauchy.

y64c · May 30, 2018, 9:04pm

Thanks, that fixed it. My problem now is that every time I try

```{r}
print(model.fit2, digits=5, pars=c("mu", "beta", "sigma"))
```

RStudio crashes.

Sampling code is as follows:

```{r, results='hide'}
data.in <- list(t = Orange$age, n = length(Orange$circumference), y = Orange$circumference)
model.fit2 <- sampling(non_pooled, data=data.in)
```

Which runs, but

What’s going wrong here? My goal was to practice hierarchical analysis, but I can’t even get the non-pooling to work.

y64c · May 30, 2018, 10:00pm

@bgoodri

I also tried to use a differently formulated model, yet it still doesn’t work!

```{stan output.var="non_pooled2"}
data {
  int<lower=0> N; 
  int<lower=1,upper=5> tree[N];
  vector[N] t;
  vector[N] y;
} 
parameters {
  vector<lower=0>[3] beta;
  real<lower=0> sigma;
} 
transformed parameters {
  vector[N] mu;

  for (i in 1:N)
    mu[i] = beta[1] * inv(1 + exp(-(t[i] - beta[2]) / beta[3]));
}
model {
  y ~ normal(mu, sigma);
  sigma ~ cauchy(0, 5);
}
``` 

```{r, results='hide'}
data.in <- list(t = Orange$age, N = length(Orange$circumference), y = Orange$circumference, tree = as.integer(Orange$Tree))
model.fit2 <- sampling(non_pooled2, data=data.in)
```

This part works, but we get

And here is where it (RStudio) crashes (again!):

```{r}
print(model.fit2, digits=5, pars=c("mu", "beta", "sigma"))
```

bgoodri · May 30, 2018, 10:37pm

I said before such models are difficult to fit. But printing shouldn’t cause RStudio to crash.

y64c · May 30, 2018, 10:44pm

Hmm, interesting. So It’s not that either of the unpooled models I’ve posted are “incorrect” somehow? Rather, they’re just hard to fit?

bgoodri · May 31, 2018, 12:24am

Yes, but your bigger problem is crashing RStudio just with printing.

y64c · May 31, 2018, 2:24pm

Ok, I managed to solve the problem. The print statement was causing RStudio to crash precisely because the Stan code/model was incorrect, which was obviously leading to something undesirable happening in the background. As to why the incorrect model was causing RStudio crash specifically when we attempted to print its output, I have no idea. This question is in the domain of the developers.

My code is as follows:

```{stan output.var="non_pooled"}
data {
  int<lower=0> n;
  vector<lower=0>[n] t;
  int<lower=0> groups;
  matrix<lower=0>[7, 5] y;
}
parameters {
  matrix<lower=0>[3, groups] beta;
  real<lower=0> sigma;
}
transformed parameters {
  matrix[7, groups] mu;
  for (j in 1:groups) {
    for (i in 1:7) {
      mu[i, j] = beta[1, j] * inv(1 + exp(-(t[i] - beta[2, j]) / beta[3, j]));
    }
  }
}
model {
  for (j in 1:groups) {
    for (i in 1:7) {
      y[i, j] ~ normal(mu[i, j], sigma);
    }
    
    beta[1, j] ~ normal(170, 5);
    beta[2, j] ~ normal(85, 20);
    beta[3, j] ~ normal(500, 500);
    
  }
  
  sigma ~ student_t(2, 0, 1);
}
```

```{r}
data.in <- list(t = Orange$age, n = length(Orange$circumference), y = matrix(Orange$circumference, 7, 5), group = as.integer(Orange$Tree), groups <- length(unique(Orange$Tree)))
model.fit2 <- sampling(non_pooled, data=data.in)
```

```{r}
print(model.fit2, digits=4, pars=c("mu", "beta", "sigma"))
```

Explanation

We have groups = 5 groups and n = 35 observations within those groups.

I created the 7 \times 5 matrix y, where the columns denote the groups and the rows denote the observations within those groups. Therefore, we have 7 \times 5 = 35 total observations, as required.

Since this is an unpooled model, we need to generate parameters \beta_1^j, \beta_2^j, \beta_3^j for each group j = 1, \dots, 5. Therefore, beta is a 3 \times 5 matrix.

This is a part that I’m still unsure of:

mu is a 7 \times 5 matrix, where the columns denote the group and the rows denote the observations. This is because each group has 7 observations, and so there are 7 observations within each group. However, I’m still concerned about this part, since I can’t make sense of having 7 different means mu for each group. In an unpooled model, isn’t each group supposed to have their own mean, not 7, as we have here?

EDIT: I just noticed that I had accidentally also copy and pasted an old version of the model along with the new version. I have now deleted the old version. My apologies for any confusion.

bgoodri · May 31, 2018, 2:45pm

I am pretty sure that is impossible. The print code is unrelated to the Stan program.

You still have multiple time points, so the conditional mean varies over time but does so differently across the three groups.

y64c · May 31, 2018, 2:56pm

Hmm, my suspicion was that, since the model was incorrect, in the sense that it wasn’t producing the output that we intended it to, the stanfit object was produced in such a way that would cause RStudio to crash when we attempted to print its values. Perhaps there were too many values produced in the stanfit object for print to handle and consequently it crashed? I’m not sure what else would cause it to crash at the print statement when everything that preceded it (compilation and sampling) worked as normal.

Ahh, that makes sense. Thanks again for the clarification, Ben.

Topic		Replies	Views
Hierarchical Nonlinear Regression Modeling	22	5158	July 19, 2022
Fitting a non-linear growth model with stan Modeling fitting-issues	5	3274	April 6, 2019
Unpooled multiple linear regression model Modeling fitting-issues , specification , hierarchical-model	1	712	December 22, 2021
Help with non-linear model [Log probability evaluates to log(0)] Modeling rstan	2	365	October 18, 2022
Physically-based model: How to parameterize program such that one variable must be larger than another Modeling	19	3224	March 26, 2018

Unpooled Non-Linear Regression in Stan

Related topics