Hi all,
I’m trying to fit a non-linear model used in soil science, relating soil pressure to water content - the van Genuchten model. I’m getting a large number of divergent transitions (70 - 90%), however. I’ve been reading the Stan manual and poking around online to try and figure out why this might be, but haven’t been successful, yet. I’m starting to think one of my (many?) problems is the way I’ve parameterized the model. One specific thing I’m looking into is that one of my parameters must be greater than another (their subtraction cannot be negative). Specifically, s > r in the model below.
Making-up data in R to be fit in Stan:
library(tidyverse)
library(rstan)
#Creating the non-linear soil water retention curve function: van Genuchten (1980)
#psi: pressure (kPa)
#s: porosity (unitless)
#r: specific retention (unitless)
#a: capillarity (kPa^-1)
#n: pore size distribution index (unitless)
#theta: volumetric soil moisture content (unitless)
vanGen <- function(psi, s, r, a, n) {
top <- s - r
bot1 <- 1 + ((a * abs(psi)) ^ n)
bot2 <- bot1 ^ (1 - (1 / n))
theta <- r + (top / bot2)
theta
}
#Generate pressure data that spans orders of magnitude.
psi <- c(1:9 %o% 10^(-2:7)) #kPa.
#Sample the result to make it a but less regular and thus more realistic.
psi <- sample(psi, round(length(psi) * 0.5))
s <- 0.8
r <- 0.2
a <- 1/5 #5 kPa is approximately 50 cm.
n <- 1.56
#Run the deterministic model.
theta <- vanGen(psi, s, r, a, n)
#Add noise to the response.
theta <- rnorm(length(theta), theta, 0.015)
#Visually inspect that the relationship looks right.
qplot(psi, theta) + scale_x_log10()
#Location of stan file on my computer (see stan code below).
fileLoc <- "ExampleModels/NonLinearRegression/vanGenuchten.stan"
#Data to be used in Stan.
data <- list(N = length(theta), theta = theta, psi = psi)
#Stan call.
vgBayes <- stan(
file = fileLoc,
data = data,
chains = 4,
iter = 1000,
warmup = 500,
algorithm = "NUTS",
control = list(adapt_delta = 0.99)
)
Below is the Stan function I made:
functions {
//Below is the same van Genuchten model made in R.
real vg(real psi, real s, real r, real a, real n) {
real top;
real bot1;
real bot2;
real out;
top = s - r; // I think this might be the problem spot, as s must always be greater than r to be physically
// meaningful.
bot1 = 1 + ((a * fabs(psi)) ^ n);
bot2 = bot1 ^ (1 - (1 / n));
out = r + (top / bot2);
return out;
}
//A root mean squared error function that I made up.
real RMSEFunc(int N, real[] yPred, real[] yObs) {
real L;
real SUM[N];
real out;
L = N * 1.0; //Wasn't sure the best practice for converting int to real, but this worked.
for(i in 1:N) {
SUM[i] = (yPred[i] - yObs[i]) ^ 2;
}
out = sqrt(sum(SUM) / L);
return out;
}
}
data {
int N;
real <lower = 0, upper = 1> theta[N];
real psi[N];
}
parameters {
real <lower = 0, upper = 1> s; //Should be physically bounded between r and 1.
real <lower = 0, upper = 1> r; //Should be physically bounded between 0 and s.
real <lower = 0> a; //Must be above 0.
real <lower = 1> n; //Must be above 1.
real <lower = 0> sigma;
}
transformed parameters{
real theta_pred[N];
for(i in 1:N){
theta_pred[i] = vg(psi[i], s, r, a, n);
}
}
model {
s ~ beta(5, 2) T[0, 1]; // Seems like a reasonable distribution for both s and r.
r ~ beta(2, 5) T[0, 1];
a ~ double_exponential(0, 0.1) T[0, ];
n ~ double_exponential(1, 0.1) T[1, ];
sigma ~ cauchy(0, 1) T[0, ]; // As I understand it, this is effectively a half-Cauchy, which is perfect for me.
for(i in 1:N){
theta[i] ~ normal(theta_pred[i], sigma) T[0, 1];
//print(theta[i]);
}
}
generated quantities {
real RMSE;
vector[N] log_lik;
RMSE = RMSEFunc(N, theta_pred, theta);
for(i in 1:N) {
log_lik[i] = normal_lpdf(theta[i] | theta_pred[i], sigma);
}
}
What I’d like to be able to do is have a condition somewhere such that s is always greater than r. I tried adding that condition to the parameters block, like:
parameters {
real <lower = r, upper = 1> s;
real <lower = 0, upper = s> r;
...
}
But this generated an error, as r had not yet been defined. I also don’t think I can have a dynamic boundary condition, like that.
If anyone has any ideas as to how to parameterize this problem, or a way to write the function that incorporates this condition, or sees other issues that I could improve upon, I’d be very grateful.
On a side note, even though I have a massive amount of divergence, the outcome of the above model is actually really good. I’m just worried because I’ve read that divergence could imply bias in the posteriors, and I want to make sure that 1) my results are as accurate as possible, and 2) that I haven’t done something extremely naive.
Cheers!