Specifying likelihood for bounded measurement errors model

I am developing a measurement errors model to calibrate a device that measures the depth of cracks and other defects in a thin surface. Two sets of measurements are performed on the same surface, each by a different device. One device (x) is cheap and fast, but has some inherent bias in addition to random noise. Measurements from the second device (y) are assumed to have random error but no bias. At the moment I have a relatively simple linear regression measurement errors model implemented in Stan as shown below.

With a Student likelihood, the model works fairly well for defects whose depths are not near 0 or the thickness of the surface, but unsurprisingly gives unrealistic predictive distributions at the extremes, as shown in the plot below. This example has 94 data points and was fit with

fit = stan(file=modelpath, data=stan_dict, iter=20000, chains=4, control=list(adapt_delta=0.99))

However, even with the high warmup and acceptance rate, in some cases I still get a dozen or so divergence errors.
linreg plot

My questions:

  1. What would be a suitable way of accurately modeling defects with depths near 0 or the surface thickness? I am considering a JohnsonSB distribution with support over the thickness, but I don’t know how to code this support in Stan, i.e., the interval [xi, xi+lambda] where xi = b0 + b1*x ≥ 0 and xi+lambda ≤ thickness. The JohnsonSB code I’ve implemented so far will run, but doesn’t produce anything useful.

  2. There are many x-measurements with the same value due to rounding, in this case to the nearest 0.035; these are not repeated measurements of the same defect. I know that technically I should probably use a rounding model as described in the Stan manual, but before going through it effort I’m wondering if this is likely to improve inference over my current implementation?

  3. Have I coded the generated quantities section correctly to predict the true defect depth given an observed x?

functions {
	// negative log PDF of JohnsonSB distribution
	real johnsonsb_lpdf(real x, real xi, real lam, real gamma, real delta) {
		// xi: location parameter
		// lambda: scale parameter >0
		// gamma: shape parameter
		// delta: shape parameter >0		
		real z;
		z = ((x - xi) / lam);
		return log(delta) - log(lam * sqrt(2*pi())*(z-z^2)) - 0.5*(gamma + delta * log(z/(1-z)))^2;
	real johnsonsb_rng(real xi, real lam, real gamma, real delta) {
		real u;
		real y;
		u = uniform_rng(0, 1);
		y = exp((inv_Phi(u)-gamma)/delta);
		return lam * y / (1+y) + xi;

data {
	int<lower=0> N;	//	number of observations
	vector[N] x_obs; // x-measurement, observed with error
	vector[N] y;	// y-measurement
	real<lower=0> b0_sd; // SD of intercept
	real<lower=0> ub;	// upper bound of measurements
	int method;	// 0 for Normal likelihodd, 1 for Student, 2 for JohnsonSU
	// Data for prediction
	int<lower=0> N_hat;
	vector[N_hat] x_hat; // new observations

parameters {
	// Latent model
	real<lower=0> mu_x;
	real<lower=0> sigma_x;
	vector<lower=0>[N] x;	// unknown true value
	real<lower=0> tau;	// measurement error

	// Regression model
	real b0;
	real<lower=0> b1;
	real<lower=0> sigma_y;
	// Student likelihood
	real<lower=0> nu;
	// JohnsonSB likelihood (using sigma_y for lambda)
	real js_gamma;	// shape parameter
	real<lower=0> js_delta;	// shape parameter

model {
	// Latent model
	tau ~ gamma(1, 30);		// measurement error SD (expected value equal to approx vendor claim of 0.033)
	x ~ normal(mu_x, sigma_x);	// prior for true x dist
	x_obs ~ normal(x, tau);		// measurement model
	// Regression model
	b0 ~ normal(0, b0_sd);
	b1 ~ normal(1, 3);
	// Student dof prior
	nu ~ gamma(2, 0.1);
	// y-scale prior
	sigma_y ~ gamma(2, 40);
	// JohnsonSU priors
	js_gamma ~ normal(1, 2);
	js_delta ~ gamma(2, 1);
	for (n in 1:N) {
		if (method==0)
			y[n] ~ normal(b0 + b1*x[n], sigma_y);	
		else if (method==1)
			y[n] ~ student_t(nu, b0 + b1*x[n], sigma_y);
			y[n] ~ johnsonsb(b0 + b1*x[n], sigma_y, js_gamma, js_delta);
generated quantities {
	vector[N] y_rep;	// PPC
	vector[N_hat] y_hat;	// new predictions
	for (n in 1:N) {
		if (method==0)
			y_rep[n] = normal_rng(b0 + b1*x[n], sigma_y);
		else if (method==1)
			y_rep[n] = student_t_rng(nu, b0 + b1*x[n], sigma_y);
			y_rep[n] = johnsonsb_rng(b0 + b1*x[n], sigma_y, js_gamma, js_delta);
	for (n in 1:N_hat) {
		if (method==0)
			y_hat[n] = normal_rng(b0 + b1*x_hat[n], sigma_y);
		else if (method==1)
			y_hat[n] = student_t_rng(nu, b0 + b1*x_hat[n], sigma_y);
			y_hat[n] = johnsonsb_rng(b0 + b1*x_hat[n], sigma_y, js_gamma, js_delta);

Many thanks!

This one’s complicated. Long questions like this are much harder to answer than sequences of smaller ones.

You just need to reduce this to lower and upper-bound constraints on the parameters in a coherent sequence. And if you have a bunch of variables that need to satisfy the constraint, take the extreme ones.

Never heard of this one.

Probably not unless the amount of rounding is large compared to the value.

That looks like it matches your model. I didn’t spend a huge amount of time verifying, but you’re using the _rng forms, which is the critical part that people often miss.

There are a lot of pieces of your code that can be sped up. For instance, log(z / (1 - z)) == logit(z).

Unless you’re using them in a mixture, yo8u can probably drop constants like sqrt(2 * pi()).

Rather than generating u = uniform_rng(0, 1) and then applying inv_Phi(u), which is very inefficient, I’d suggest just using normal_rng(0, 1), since it will have the smae distribution as inv_Phi(uniform_rng(0, 1)).

Computing b0 + b1 * x all in one go and then indexing it will be much more efficient. As would collapsing the y based on method so you can vectorize.