Non-linear model converges some of the time, but not most of the time. How to specify for more reliable convergence?

Dear all:

I’m struggling to fit a non-linear model. According to theory:

y=\frac{\omega}{(1+r-\omega)}\cdot x_1 + \frac{(1+r)}{(1+r-\omega)(1+r-\gamma)}\cdot x_2 + \epsilon

For observed values of y, x_1, and x_2, with parameters \omega, \gamma, and r, and where \epsilon is normally distributed noise with variance \sigma. Sometimes (like in my code below), I simplify that as:

y = \beta_1\cdot x_1 + \beta_2\cdot x_2 + \epsilon

Where \beta_1=\frac{\omega}{(1+r-\omega)} and \beta_2=\frac{(1+r)}{(1+r-\omega)(1+r-\gamma)} are transformed parameters.

My chains don’t converge. Well, sometimes they do. But not often. Rarely, in fact. But I’ve seen it happen a couple of times. So I know it’s possible.

I’m debugging my code on data I simulated myself so I know the `true’ parameter values before I run the chains. For various reasons, I seeded the parameter values with their priors:

\omega \sim N(0.75,0.1) \\ \gamma \sim N(0.25,0.1) \\ r \sim N(0.05,0.01) \\ \sigma \sim \text{exp}(0.1)

In fact, here’s the code I wrote to do that:

nobs <- 1000

# Observed Variables
x1 <- runif(nobs,0,100)
x2 <- runif(nobs,0,100)

# Unobserved Parameters
omega <- rnorm(nobs,0.75,0.1)
gamma <- rnorm(nobs,0.25,0.1)
r     <- rnorm(nobs,0.05,0.01)

# Observed Outcome (w/ Additive Noise)
y <- x1 * omega / (1+r-omega) + x2 * (1+r) / ((1+r-omega)*(1+r-gamma)) + rnorm(nobs,0,10)

data <- data.frame(
  x1 = x1,
  x2 = x2,
  y = y
)

I’m using the rethinking package based on the book by Richard McElreath. It’s a really good book. So my code looks like:

library(rethinking)
model <- ulam(
  alist(
    y ~ dnorm(mu,sigma),
    transpars> b1 <<- omega / (1+r-omega),
    transpars> b2 <<- (1+r)/((1+r-omega)*(1+r-gamma)),
    mu <- b1*x1 + b2*x2,
    omega ~ dnorm(0.75,0.1),
    gamma ~ dnorm(0.25,0.1),
    r ~ dnorm(0.05,0.01),
    sigma ~ dexp(1)
  ), data=data, chains=4, cores=4, iter=1000
)

But it’s easy to output the raw Stan code if that’s easier:

data{
     vector[1000] y;
     vector[1000] x2;
     vector[1000] x1;
}
parameters{
     real omega;
     real gamma;
     real r;
     real<lower=0> sigma;
}
transformed parameters{
     real b1;
     real b2;
    b2 = (1 + r)/((1 + r - omega) * (1 + r - gamma));
    b1 = omega/(1 + r - omega);
}
model{
     vector[1000] mu;
    sigma ~ exponential( 1 );
    r ~ normal( 0.05 , 0.01 );
    gamma ~ normal( 0.25 , 0.1 );
    omega ~ normal( 0.75 , 0.1 );
    for ( i in 1:1000 ) {
        mu[i] = b1 * x1[i] + b2 * x2[i];
    }
    y ~ normal( mu , sigma );
}

Every once in awhile the chains converge on each other, and when they do they converge on the seeded parameter values. The estimate for \sigma seems off, but maybe that’s because I seeded the parameters as random coefficients rather than constants. Here’s an example:

        mean    sd   5.5%  94.5% n_eff Rhat4
omega   0.78  0.02   0.74   0.82  1046     1
gamma   0.21  0.09   0.06   0.35  1136     1
r       0.05  0.01   0.03   0.07  1480     1
sigma 768.12 11.71 749.93 786.59  1323     1
b2      4.67  0.33   4.16   5.22  1925     1
b1      2.92  0.31   2.43   3.44  1162     1

Rplot

Rplot01

I like the pairs plot because those nice round scatter plots rule out the type of identification issues indicated by (near perfectly) correlated parameters. I saved those results. They give me hope. I look at them when I’m feeling down.

But most of the time… and I mean most of the time… I get results that look something like this:

        mean    sd   5.5%  94.5% n_eff Rhat4
omega   0.52  0.27   0.16   0.81     2  5.01
gamma   0.94  0.73   0.10   1.72     2 10.77
r       0.04  0.01   0.02   0.06     7  1.19
sigma 804.79 38.52 754.16 856.94     2  3.42
b2      1.26  3.44  -2.39   5.07     2 13.92
b1      1.63  1.30   0.19   3.30     2  5.77

Those parameter estimates aren’t just wrong, the diagnostic statistics are horrible. Clearly, the chains are not converging.

Rplot02
Rplot03

If I were to interpret the above, it looks like the sampler is getting stuck in some “difficult-to-explore” region of the posterior distribution, possibly on some other peak or valley. The issue doesn’t resolve when I set the adapt_delta control parameter to a higher value.

The only other advice I’ve found is to “reparameterize” the model. But I don’t see it. I’m at a loss. Is that even possible?

Clearly, the model can work. But how can I make it work more often? I should add, I would eventually like to evolve the model into a hierarchical one that allows the parameters to vary within groups. I’m just getting discouraged it can’t reliably predict the more simpler case. Am I missing something obvious?

The posterior seems to have quite clear bimodality. If you think that the other mode is in part of the parameter space that is not feasible given external information, then code that information in prior. See a related example in section 11 (p. 56-) in Bayesian Workflow

1 Like