Dear all:
I’m struggling to fit a non-linear model. According to theory:
y=\frac{\omega}{(1+r-\omega)}\cdot x_1 + \frac{(1+r)}{(1+r-\omega)(1+r-\gamma)}\cdot x_2 + \epsilon
For observed values of y, x_1, and x_2, with parameters \omega, \gamma, and r, and where \epsilon is normally distributed noise with variance \sigma. Sometimes (like in my code below), I simplify that as:
y = \beta_1\cdot x_1 + \beta_2\cdot x_2 + \epsilon
Where \beta_1=\frac{\omega}{(1+r-\omega)} and \beta_2=\frac{(1+r)}{(1+r-\omega)(1+r-\gamma)} are transformed parameters.
My chains don’t converge. Well, sometimes they do. But not often. Rarely, in fact. But I’ve seen it happen a couple of times. So I know it’s possible.
I’m debugging my code on data I simulated myself so I know the `true’ parameter values before I run the chains. For various reasons, I seeded the parameter values with their priors:
\omega \sim N(0.75,0.1) \\ \gamma \sim N(0.25,0.1) \\ r \sim N(0.05,0.01) \\ \sigma \sim \text{exp}(0.1)
In fact, here’s the code I wrote to do that:
nobs <- 1000
# Observed Variables
x1 <- runif(nobs,0,100)
x2 <- runif(nobs,0,100)
# Unobserved Parameters
omega <- rnorm(nobs,0.75,0.1)
gamma <- rnorm(nobs,0.25,0.1)
r <- rnorm(nobs,0.05,0.01)
# Observed Outcome (w/ Additive Noise)
y <- x1 * omega / (1+r-omega) + x2 * (1+r) / ((1+r-omega)*(1+r-gamma)) + rnorm(nobs,0,10)
data <- data.frame(
x1 = x1,
x2 = x2,
y = y
)
I’m using the rethinking package based on the book by Richard McElreath. It’s a really good book. So my code looks like:
library(rethinking)
model <- ulam(
alist(
y ~ dnorm(mu,sigma),
transpars> b1 <<- omega / (1+r-omega),
transpars> b2 <<- (1+r)/((1+r-omega)*(1+r-gamma)),
mu <- b1*x1 + b2*x2,
omega ~ dnorm(0.75,0.1),
gamma ~ dnorm(0.25,0.1),
r ~ dnorm(0.05,0.01),
sigma ~ dexp(1)
), data=data, chains=4, cores=4, iter=1000
)
But it’s easy to output the raw Stan code if that’s easier:
data{
vector[1000] y;
vector[1000] x2;
vector[1000] x1;
}
parameters{
real omega;
real gamma;
real r;
real<lower=0> sigma;
}
transformed parameters{
real b1;
real b2;
b2 = (1 + r)/((1 + r - omega) * (1 + r - gamma));
b1 = omega/(1 + r - omega);
}
model{
vector[1000] mu;
sigma ~ exponential( 1 );
r ~ normal( 0.05 , 0.01 );
gamma ~ normal( 0.25 , 0.1 );
omega ~ normal( 0.75 , 0.1 );
for ( i in 1:1000 ) {
mu[i] = b1 * x1[i] + b2 * x2[i];
}
y ~ normal( mu , sigma );
}
Every once in awhile the chains converge on each other, and when they do they converge on the seeded parameter values. The estimate for \sigma seems off, but maybe that’s because I seeded the parameters as random coefficients rather than constants. Here’s an example:
mean sd 5.5% 94.5% n_eff Rhat4
omega 0.78 0.02 0.74 0.82 1046 1
gamma 0.21 0.09 0.06 0.35 1136 1
r 0.05 0.01 0.03 0.07 1480 1
sigma 768.12 11.71 749.93 786.59 1323 1
b2 4.67 0.33 4.16 5.22 1925 1
b1 2.92 0.31 2.43 3.44 1162 1
I like the pairs plot because those nice round scatter plots rule out the type of identification issues indicated by (near perfectly) correlated parameters. I saved those results. They give me hope. I look at them when I’m feeling down.
But most of the time… and I mean most of the time… I get results that look something like this:
mean sd 5.5% 94.5% n_eff Rhat4
omega 0.52 0.27 0.16 0.81 2 5.01
gamma 0.94 0.73 0.10 1.72 2 10.77
r 0.04 0.01 0.02 0.06 7 1.19
sigma 804.79 38.52 754.16 856.94 2 3.42
b2 1.26 3.44 -2.39 5.07 2 13.92
b1 1.63 1.30 0.19 3.30 2 5.77
Those parameter estimates aren’t just wrong, the diagnostic statistics are horrible. Clearly, the chains are not converging.
If I were to interpret the above, it looks like the sampler is getting stuck in some “difficult-to-explore” region of the posterior distribution, possibly on some other peak or valley. The issue doesn’t resolve when I set the adapt_delta control parameter to a higher value.
The only other advice I’ve found is to “reparameterize” the model. But I don’t see it. I’m at a loss. Is that even possible?
Clearly, the model can work. But how can I make it work more often? I should add, I would eventually like to evolve the model into a hierarchical one that allows the parameters to vary within groups. I’m just getting discouraged it can’t reliably predict the more simpler case. Am I missing something obvious?