# Performance issue with an outcome measurement error model with data transformation

Hi All,

A stan newbie here. I am trying to fit a linear model with measurement error in the response variable which includes some data transformation (radiocarbon calibration). This is just an initial test for something more complex that I am planning to work on, but I am already struggling with performance issues at this stage…

I have a model written in nimble (see attached `nimbleVersion.R`) and wanted to start developing its stan version (see below, also the attached `stanVersion.R`). The nimble model works fine and takes five minutes for getting a decent mixing, and a correct inference of the true parameters. The stan version is instead stuck forever at

``````Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
``````

Stan model:

``````// Interpolation Function
functions{
real linear_interpolation_v(real x, vector x_pred, vector y_pred){
int K = rows(x_pred);
vector[K] deltas = x - x_pred;
real ans;
real t;
real w;
int i;
real x1;
real x2;
real y1;
real y2;
if(x>x_pred[1] || x<x_pred[K]) reject("x is outside of the x_pred grid!");
if(rows(y_pred) != K) reject("x_pred and y_pred aren not of the same size");
//this is which.min()
i = sort_indices_asc(fabs(deltas))[1];
if(deltas[i]<=0) i -= 1;
ans = y_pred[i];
x1 = x_pred[i];
x2 = x_pred[i + 1];
y1 = y_pred[i];
y2 = y_pred[i + 1];
ans = y1 + (y2-y1) * (x-x1)/(x2-x1);
t = (x-x1)/(x2-x1);
w = 1-t;
ans = w*y1 + (1-w)*y2;
return ans;
}

} // end functions block

data {
int<lower=1> N; //Number of 14C dates
vector[N] cra; //Observed CRA
vector[N] cra_error; // Observed Calibration Error
vector[N] d; //independent variable (distance from source)
int<lower=0> tau; // Number of raw in the calibration curve
vector[tau] calBP; //Calibration curve
vector[tau] c14BP; //Calibration curve
vector[tau] c14_err; //Calibration curve
real prior_intercept_mean;
real prior_intercept_sd;
}

parameters {
vector<lower=0,upper=55000>[N] theta;
real alpha; // Intercept
real beta; // Slope
real <lower=0> sigma_model; //Regression Error
}

model {
// Declare temporary variables:
vector[N] sigma_curve;
vector[N] mu_date;
vector[N] sigma_date;

// Priors:
alpha ~ normal(prior_intercept_mean,prior_intercept_sd);
beta ~ normal(0,0.5);
sigma_model ~ exponential(0.001);

// Core Model:
for (i in 1:N)
{
theta[i] ~ normal(alpha+beta*d[i],sigma_model);
sigma_curve[i] = linear_interpolation_v(theta[i],calBP,c14_err);
mu_date[i] = linear_interpolation_v(theta[i],calBP,c14BP);
sigma_date[i] = sqrt (cra_error[i]^2+sigma_curve[i]^2);
cra[i] ~ normal(mu_date[i],sigma_date[i]);
}
}
``````

Some simulated data:

``````library(rcarbon)
library(nimbleCarbon)
set.seed(123)
alpha = 3000
beta = -2
sigma = 200
N = 100
dist = runif(N,0,1000)
calendar_date = round(rnorm(N,alpha+beta*dist,sd=sigma))
cra = round(uncalibrate(calendar_date)\$ccCRA)
cra_error = rep(20,N)
data(intcal20)
x.rcarbon = calibrate(cra,cra_error)
m=medCal(x.rcarbon)

d = list(N=N,
cra = cra,
cra_error=cra_error,
d = dist,
tau=nrow(intcal20),
calBP=intcal20\$CalBP,
c14BP=intcal20\$C14Age,
c14_err=intcal20\$C14Age.sigma,
a=min(intcal20\$CalBP),
b=max(intcal20\$CalBP),
prior_intercept_mean=3000,
prior_intercept_sd=200)
``````

And the model call + init:

``````init=function(...){list(alpha=3000,beta=0,sigma_model=200,theta=m)}
fit <- stan(file = "model.stan",data=d,init=init,chain=2)
``````

Clearly there is something wrong - I know for loops tend to slow things down, but the difference in performance between nimble and stan seems to indicate that there is something else that I am missing… Any help or tips are greatly appreciated!

Enrico

stanVersion.R (2.9 KB)
nimbleVersion.R (1.8 KB)

I actually don’t think you need the loop at all, as this:

``````for (i in 1:N){
theta[i] ~ normal(alpha+beta*d[i],sigma_model);
}
``````

can certainly be changed to:

``````theta ~ normal(alpha+beta*d,sigma_model);
``````

and even faster:

``````theta ~ normal_id_glm(d,alpha,beta,sigma_model);
``````

Then, if you define your `linear_interpolation_v` function with a vector signature you can omit the temporary-variable creation (which I recently learned does slow things down):

``````...
transformed data{
vector[N] cra_error_sq = pow(cra_error,2) ;
}
...
model {

// Priors:
alpha ~ normal(prior_intercept_mean,prior_intercept_sd);
beta ~ normal(0,0.5);
sigma_model ~ exponential(0.001);

// Core Model:
theta ~ normal_id_glm(d,alpha,beta,sigma_model);
cra ~ normal(
linear_interpolation_v(theta,calBP,c14BP)
, sqrt( cra_error_sq + linear_interpolation_v(theta,calBP,c14_err)^2 )
)

}
``````
1 Like

Thanks @mike-lawrence - that was very helpful. Things were running still slow, but I also realised that `linear_interpolation_v` could have been written much more efficiently (I did copy from somewhere without looking too much into it);. The following did manage to run - still not terribly fast, but at least it runs!

``````// Interpolation Function
functions{
int which_first(real z, vector x){
int index = 1;
for (i in 1:rows(x))
{
if (z > x[i]) {break;}
else {index += 1;}
}
return(index);
}

vector linear_interpolation_v(vector z, vector x, vector y)
{
int J = rows(z);
vector[J] ans;
for (j in 1:J)
{
int index=which_first(z[j],x);
ans[j] = y[index] + (((y[index - 1] - y[index]) * (z[j] - x[index]))/(x[index - 1] - x[index]));
}
return ans;
}
} // end functions block

data {
int<lower=1> N; //Number of 14C dates
vector[N] cra; //Observed CRA
vector[N] cra_error; // Observed Calibration Error
vector[N] d; //independent variable (distance from source)
int<lower=0> tau; // Number of raw in the calibration curve
vector[tau] calBP; //Calibration curve
vector[tau] c14BP; //Calibration curve
vector[tau] c14_err; //Calibration curve
real prior_intercept_mean;
real prior_intercept_sd;
}

transformed data{
vector[N] cra_error_sq = square(cra_error);
}

parameters {
vector<lower=0,upper=55000>[N] theta;
real alpha; // Intercept
real beta; // Slope
real <lower=0> sigma_model; //Regression Error
}

model {
// Priors:
alpha ~ normal(prior_intercept_mean,prior_intercept_sd);
beta ~ normal(0,0.5);
sigma_model ~ exponential(0.001);

// Core Model:
theta ~ normal(alpha+beta*d,sigma_model);
//  theta ~ normal_id_glm(d,alpha,beta,sigma_model);
cra ~ normal(linear_interpolation_v(theta,calBP,c14BP), sqrt( cra_error_sq + square(linear_interpolation_v(theta,calBP,c14_err))));
}

``````
1 Like