Rstan Error in compileCode(f, code, language = language, verbose = verbose)

Hi all,

New to Stan forums, so apologies if the issue is chunky. I have been experiencing a problem when I try to compile my Stan model, getting this following error:

Error in compileCode(f, code, language = language, verbose = verbose) : 
  C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file5968d961a3b.o:file5968d961a3b.cpp:(.text$_ZN3tbb8internal26task_scheduler_observer_v3D0Ev[_ZN3tbb8internal26task_scheduler_observer_v3D0Ev]+0x1d): undefined reference to `tbb::internal::task_scheduler_observer_v3::observe(bool)'C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file5968d961a3b.o:file5968d961a3b.cpp:(.text$_ZN3tbb10interface623task_scheduler_observerD1Ev[_ZN3tbb10interface623task_scheduler_observerD1Ev]+0x1d): undefined reference to `tbb::internal::task_scheduler_observer_v3::observe(bool)'C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file5968d961a3b.o:file5968d961a3b.cpp:(.text$_ZN3tbb10interface623task_scheduler_observerD1Ev[_ZN3tbb10interface623task_scheduler_observerD1Ev]+0x3a): undefined reference to `tbb::internal::task_scheduler_observer_v3::observe(bool)'C:\rtools42\x86_64-w64-mingw32.static.posix\bin/ld.exe: file5968d961a3b.o:file5968d961a3b.cpp:(.text$_ZN3tbb10interface623task_
Error in sink(type = "output") : invalid connection

I have tried the things indicated in the threat Rstan fit Error in compileCode(f, code, language = language, verbose = verbose), but I am still getting errors. I used to be able to compile with no problems, but I don’t know why I can’t any longer.

R version 4.2.1
StanHeaders 2.21.0-7
rstan 2.21.5

Any help would be appreciated, thank you!

We’re currently having compatibility issues with the current rstan and R4.2, until we can get a new version up on CRAN you can get your code working by installing the preview of the next version:

remove.packages(c("StanHeaders", "rstan"))
install.packages("StanHeaders", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

Thanks, that worked! However, I am now getting the following message when I start the sampling:

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
      Exception: variable does not exist; processing stage=data initialization; variable name=RtVals; base type=double (in 'string', line 14, column 1 to column 33)
    failed to create the sampler; sampling not done

That error indicates that the data you’re passing to Stan is missing a variable called RtVals

But the data I am passing does have RtVals in it. Again, I used this script in the past with the same data and I had no problems.

Can you post the R code that you’re using to create the list of data and the call to stan?

Sure. It should be something like this for the data:

data_stan <- list(
  RtVals = data_model$Rt,
  Exposure = data_model[,exposure],
  NumExp = length(exposure),
  NumDatapoints = NumDatapoints,
  NumTimepoints = NumTimepoints,
  Boolean1 = 1,
  Boolean2 = 0
 )

And the call to stan is:

ModelChar <- "model_variant_3"
StanModel <- rstan::stan_model(here(paste0("Stan_models/", ModelChar, ".stan")))

And the call to sampling? That’s where the data gets passed, so the issue might be there

And when you say “It should be something like this”, is this the actual code that you’re running that’s resulting in the error? Otherwise it makes it hard to know what the issue could be and be able to replicate it if it’s a bug

Right, the actual code is:

data_stan <- list(
  RtVals = data_model$Rt,
  VaxProp = data_model[,covar_vax],
  NumLTLAs = NumLTLAs,
  NumDoses = length(covar_vax),
  NumDatapoints = NumDatapoints,
  LTLAs = data_model$LTLAs,
  NumTimepoints = NumTimepoints,
  IncludeIntercept = IncludeIntercept,
  IncludeScaling = IncludeScaling,
  DoKnots = DoKnots,
  Quadratic = Quadratic,
  NumKnots = NumKnots,
  Knots = Knots_weeks,
  NumPointsLine = NumPointsLine,
  Timepoints = Timepoints
)

And the call to sampling is:

 fit <- sampling(StanModel, data = data_stan, 
                 iter = iterations, 
                 warmup	= warmup, 
                 thin = thin, 
                 chains	= chains, 
                 pars	= c("VaxEffect", "LogPredictions", "RegionalTrends",
                          "NationalTrend", "gamma", "intercept", "log_lik", "lambda"), 
                 control = list(adapt_delta = delta,
                                max_treedepth = depth))

And the final thing to check, can you post the output from these R calls:

str(data_stan, vec.len=0)
cat(StanModel@model_code)

If I can see the structure of the input data and the target model data{} block then I can try and recreate locally

Here they are:

> str(data_stan_1a, vec.len=0)
List of 15
 $ RtVals          : num [1:12726] NULL ...
 $ VaxProp         :'data.frame':	12726 obs. of  3 variables:
  ..$ First_Prop : num [1:12726] NULL ...
  ..$ Second_Prop: num [1:12726] NULL ...
  ..$ Third_Prop : num [1:12726] NULL ...
 $ NumLTLAs        : int NULL ...
 $ NumDoses        : int NULL ...
 $ NumDatapoints   : int NULL ...
 $ LTLAs           : int [1:12726] NULL ...
 $ NumTimepoints   : int NULL ...
 $ IncludeIntercept: num NULL ...
 $ IncludeScaling  : num NULL ...
 $ DoKnots         : num NULL ...
 $ Quadratic       : num NULL ...
 $ NumKnots        : int NULL ...
 $ Knots           : num [1:6] NULL ...
 $ NumPointsLine   : int NULL ...
 $ Timepoints      : num [1:12726] NULL ...
data {
	int <lower = 0, upper = 1>	IncludeIntercept; 	// Boolean for intercept
	int <lower = 0, upper = 1>	IncludeScaling; 	// Boolean for scaling
	int <lower = 0, upper = 1>	DoKnots;        	// Boolean for spline
	int <lower = 0, upper = 1>	Quadratic;        	// Boolean for quadratic (If 1, quadratic spline, if 0, linear)

	int<lower = 1>	NumDatapoints; 	// Number of Rt values (all timepoints x regions)
	int<lower = 1>	NumDoses; 		// Number of parameters: Vax doses
	int<lower = 1>	NumLTLAs;		// Number of regions / LTLAs
	int<lower = 1>	NumTimepoints;	// Number of timepoints (weeks)
	int<lower = 1>	NumKnots;       // Number of knots
	int<lower = 1>	NumPointsLine;  // Number of line-lambdas (knots x regions)

	vector[NumDatapoints] 			RtVals; 	// y: Rt values across all time points and regions (expressed as giant vector)
	matrix[NumDatapoints,NumDoses] 	VaxProp;	// x predictor: Binary design matrix. Each row is a region and date combination.
	    // Each column has the proportion of vax at that time/LTLA with 1, 2, 3 doses
	int 							LTLAs[NumDatapoints];  // vector giving LTLA number for each Rt-region combination.

	vector[NumKnots]		Knots;	  // Sequence of knots
	vector[NumDatapoints]   Timepoints; // Sequence of timepoints
}
transformed data{
	int<lower = 1> 	IntDim = 1;			// internal dimension of matrix factors - number of latent factors.
}
parameters {
	matrix[NumLTLAs,IntDim] 		gamma_nc; 	  		// Prev alpha, indexed by: i) LTLA; ii) factor
	vector[NumLTLAs] 				intercept_nc;     	// indexed by: i) LTLA

	//// looks wrong to have two things here - your parameters are only the knots - not two versions of the knots.
	matrix[NumKnots,IntDim] 		lambda_raw_nc; 		// indexed by: i) Knots, ii) factor
	matrix[NumTimepoints,IntDim] 	lambda_raw_nc_par; 	// indexed by: i) Timepoint, ii) factor

	vector<lower = 0>[NumDoses] 	VaxEffect_nc;
	real<lower = 0> 				sigma_nc;
	real<lower = 0> 				phi_nc;
	real<lower = 0> 				phi2_nc;
	real<lower = 0> 				phi3_nc;
	real<lower = 0> 				phi4_nc;
}
transformed parameters{
	// allocate
	vector[NumLTLAs] 			intercept 	= rep_vector(0, NumLTLAs);
	matrix[NumLTLAs,IntDim] 	gamma 		= rep_matrix(0, NumLTLAs, IntDim);
	vector[NumDoses] 			VaxEffect 	= rep_vector(0, NumDoses);
	real sigma 	= 0;
	real phi 	= 0;
	real phi2 	= 0;
	real phi3 	= 0;
	real phi4 	= 0;
	vector[NumDatapoints] RegionalTrends 		= rep_vector(0, NumDatapoints);
	vector[NumDatapoints] VacEffects_Regional 	= rep_vector(0, NumDatapoints);
	vector[NumDatapoints] LogPredictions 		= rep_vector(0, NumDatapoints);

	matrix[NumKnots,IntDim] lambda_raw 		= rep_matrix(0, NumKnots, IntDim); 		// has NumKnots rows (not NumDatapoints)
	matrix[NumPointsLine,IntDim] lambda 	= rep_matrix(0, NumPointsLine, IntDim); // has NumKnots*NumLTLAs rows (not NumTimepoints)
	matrix[NumKnots-1, IntDim] origin; 	// intercept
	matrix[NumKnots-1, IntDim] slope;  	// slope
	matrix[NumKnots-2, IntDim] a; 		// Coeff a for quadratic eq
	matrix[NumKnots-2, IntDim] b; 		// Coeff b for quadratic eq
	matrix[NumKnots-2, IntDim] c; 		// Coeff c for quadratic eq

	matrix[NumTimepoints,IntDim]	lambda_raw_par	= rep_matrix(0, NumTimepoints, IntDim); // has NumTimepoint rows (not NumDatapoints)
	matrix[NumDatapoints, IntDim]	NationalTrend	= rep_matrix(0, NumDatapoints, IntDim); // To calculate lambda from line

	// initialize - get centered parameter values from their non-centered equivalents.
	phi  		= phi_nc		* 2.0;
	phi2 		= phi2_nc		* 0.5;
	phi3 		= phi3_nc		* 0.5;
	phi4 		= phi4_nc		* 0.5;

	sigma 		= sigma_nc		* 0.5;
	VaxEffect 	= VaxEffect_nc 	* phi;
	gamma 		= gamma_nc 		* phi2;
	intercept 	= intercept_nc	* phi3;

	if (DoKnots) {

		// Lambda for the SPLINE
		lambda_raw 	= lambda_raw_nc 	* phi4;
		{
			// initialize lambda matrix to have same values for every LTLA
			int ind = 0; // initialize index

			for (i in 1:NumLTLAs){
				for(j in 1:IntDim)
					lambda[(ind + 1):(ind + NumKnots), j] = lambda_raw[1:NumKnots, j];

				ind = ind + NumKnots; // update index
			}
		}

		// spline to fit the lambda
		if (Quadratic) {

			for (i in 1:(NumKnots-2))
				for (j in 1:IntDim) {
					a[i,j] = (lambda[i+2, j] - lambda[i+1, j] - ((Knots[i+2] - Knots[i+1]) * (lambda[i, j] - lambda[i+1, j]) / (Knots[i] - Knots[i+1]))) / ((Knots[i+2]*Knots[i+2]) - (Knots[i+1]*Knots[i+1]) - (Knots[i] + Knots[i+1])*(Knots[i+2] - Knots[i+1]));
					b[i,j] = ((lambda[i, j] - lambda[i+1, j]) / (Knots[i] - Knots[i+1])) - (Knots[i] + Knots[i+1])*a[i,j];
					c[i,j] = lambda[i, j] - (b[i,j] * Knots[i]) - (a[i,j] * (Knots[i]*Knots[i]));
				}

		} else {

			for (i in 1:(NumKnots-1))
				for (j in 1:IntDim) {
					slope[i,j] = (lambda[i+1, j] - lambda[i, j])/(Knots[i+1] - Knots[i]);
					origin[i,j] = lambda[i, j] - slope[i, j]*Knots[i];
				}
		}

		// Calculate lambda from the spline ONLY IF we are doing knots
		if (Quadratic) {

		// very inefficient, in that you're calculating x and y coordinates for spline for every LTLA, when before vaccination they are all the same.
		// Make the calculation once for the spline, then populate for every LTLA. See (and check) below
		for (i in 1:NumDatapoints)
			for (j in 1:IntDim)
		  		if (LTLAs[i] == 1) {

			        if (Timepoints[i] >= Knots[NumKnots-1] && Timepoints[i] < Knots[NumKnots]) {
			              NationalTrend[i,j] = a[NumKnots-2,1]*(Timepoints[i] * Timepoints[i]) + b[NumKnots-2,1]*Timepoints[i] + c[NumKnots-2,1];

			        } else for (k in 1:(NumKnots-2))
			            if (Timepoints[i] >= Knots[k] && Timepoints[i] < Knots[k+1])
			              NationalTrend[i,j] = a[k,j]*(Timepoints[i] * Timepoints[i]) + b[k,j]*Timepoints[i] + c[k,j];

		  		} else NationalTrend[i,j] = NationalTrend[(i - NumTimepoints),j]; // i.e. make equal to previous LTLA's NationalTrend

		} else { // i.e. doing knots, linear spline

			// very inefficient, in that you're calculating x and y coordinates for spline for every LTLA, when before vaccination they are all the same.
			// Make the calculation once for the spline, then populate for every LTLA. See (and check) below
			for (i in 1:NumDatapoints)
				for (j in 1:IntDim)
					if (LTLAs[i] == 1) {
						for (k in 1:(NumKnots-1))
							if (Timepoints[i] >= Knots[k] && Timepoints[i] < Knots[k+1])
								NationalTrend[i,j] = origin[k,j] + slope[k,j] * Timepoints[i];

					} else NationalTrend[i,j] = NationalTrend[(i - NumTimepoints),j]; // i.e. make equal to previous LTLA's NationalTrend
		}
	} else { // i.e. step function (each week a free parameter)

		// Initialise lambda if we are not doing knots: free parameters allowed
		lambda_raw_par 	= lambda_raw_nc_par 	* phi4;
		{
			// initialize lambda matrix to have same values for every LTLA
			int ind_par = 0; // initialize index
			for (i in 1:NumLTLAs){

				for(j in 1:IntDim)
					NationalTrend[(ind_par + 1):(ind_par + NumTimepoints), j] = lambda_raw_par[1:NumTimepoints, j];
				ind_par = ind_par + NumTimepoints; // update index
			}
		}
 	}
	// CONTINUE WITH LOG PRED MODEL WHETHER LAMBDA WAS FIT IN THE LINE OR NOT
	// VacEffects_Regional[1:NumDatapoints] = VaxProp * VaxEffect; // x * -beta in manuscript
	for (i in 1:NumDatapoints)
	{
		VacEffects_Regional[i] = 0; // initialize to zero for each timepoint
		for (j in 1:NumDoses)
			VacEffects_Regional[i] += VaxProp[i,j] * VaxEffect[j];
	}

	// Calculate regional trend from national trend
	for (i in 1:NumDatapoints){
		for(j in 1:IntDim){
			if (IncludeIntercept) {

				if (IncludeScaling) {
					RegionalTrends[i] += NationalTrend[i,j] * gamma[LTLAs[i],j] + intercept[LTLAs[i]]; // lambda * gamma^T in manuscript. Note use of intercept makes this line akin to IntDim (B) = 2 with column vector of 1s for one column of lambda
				} else {
					RegionalTrends[i] += NationalTrend[i,j] + intercept[LTLAs[i]]; }

			} else {

				if (IncludeScaling) {
					RegionalTrends[i] += NationalTrend[i,j] * gamma[LTLAs[i],j]; // lambda * gamma^T in manuscript. Note use of intercept makes this line akin to IntDim (B) = 2 with column vector of 1s for one column of lambda
				} else {
					RegionalTrends[i] += NationalTrend[i,j]; }
			}
		}
	}
	// final (logged) regional Rt predictions are regional trends minus regional vaccine effects
	LogPredictions[1:NumDatapoints] = RegionalTrends[1:NumDatapoints] - VacEffects_Regional[1:NumDatapoints];
}
model {

	VaxEffect_nc ~ std_normal();

	for (i in 1:NumLTLAs)
		for(j in 1:IntDim)
			gamma_nc[i,j] ~ std_normal();

	for (i in 1:NumKnots)
		for(j in 1:IntDim)
			lambda_raw_nc[i,j] ~ std_normal();
	for (i in 1:NumTimepoints)
		for(j in 1:IntDim)
			lambda_raw_nc_par[i,j] ~ std_normal();

	intercept_nc 	~ std_normal();
	phi_nc 			~ std_normal();
	phi2_nc 		~ std_normal();
	phi3_nc 		~ std_normal();
	phi4_nc 		~ std_normal();
	sigma_nc 		~ std_normal();
	RtVals 			~ normal(LogPredictions, sigma);
}
generated quantities {
	vector[NumDatapoints] log_lik;

	for (i in 1:NumDatapoints) {
		log_lik[i] = normal_lpdf(RtVals[i] | LogPredictions[i], sigma); // NOTES: Log of normal function: real normal_lpdf(reals y | reals mu, reals sigma)
	}
}
str(data_stan_1a, vec.len=0)

Oh that’s a different list from your sampling call before (which was just data_stan), do you get the same error when you pass this list as data? Again, it’s really important to show exactly the code that’s causing the error, otherwise it’s hard to know where the issue could be coming from

Yes, the error is the same. Sorry, just a little typo above, the pass list command should have started with data_stan_1a, but the code afterwards is the same. I just have multiple data sets with different combinations of the booleans, but they all give the same errors.

Hmm, very strange! Can you try restarting your R session and running the code in a clean session? Just to make sure that an older compiled model isn’t being picked up.

If it’s still problematic, would you be able to post some data (even fake/dummy data) and the model that reproduces the error? Then I can chase down where in the rstan source things are going wrong

Hi @andrjohns, thanks very much for following this. I did as proposed and restarted R and run everything from a clean session. It finally got it working locally! However, I tried a couple of times (restarting R, etc.) to run the same job in a HPC cluster, and I keep getting the same error. Any ideas why?

The HPC cluster might be a different issue. Can you post the output from devtools::session_info("rstan")?

Yes, this is:

> devtools::session_info("rstan")
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22000)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United Kingdom.utf8
 ctype    English_United Kingdom.utf8
 tz       Europe/London
 date     2022-07-04
 rstudio  2022.02.3+492 Prairie Trillium (desktop)
 pandoc   NA

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package      * version   date (UTC) lib source
   backports      1.4.1     2021-12-13 [1] CRAN (R 4.2.0)
   BH             1.78.0-0  2021-12-15 [1] CRAN (R 4.2.0)
   callr          3.7.0     2021-04-20 [1] CRAN (R 4.2.0)
   checkmate      2.1.0     2022-04-21 [1] CRAN (R 4.2.0)
   cli            3.3.0     2022-04-25 [1] CRAN (R 4.2.0)
   colorspace     2.0-3     2022-02-21 [1] CRAN (R 4.2.0)
   crayon         1.5.1     2022-03-26 [1] CRAN (R 4.2.0)
   curl           4.3.2     2021-06-23 [1] CRAN (R 4.2.0)
   desc           1.4.1     2022-03-06 [1] CRAN (R 4.2.0)
   digest         0.6.29    2021-12-01 [1] CRAN (R 4.2.0)
   ellipsis       0.3.2     2021-04-29 [1] CRAN (R 4.2.0)
   fansi          1.0.3     2022-03-24 [1] CRAN (R 4.2.0)
   farver         2.1.0     2021-02-28 [1] CRAN (R 4.2.0)
   ggplot2      * 3.3.6     2022-05-03 [1] CRAN (R 4.2.0)
   glue           1.6.2     2022-02-24 [1] CRAN (R 4.2.0)
   gridExtra      2.3       2017-09-09 [1] CRAN (R 4.2.0)
   gtable         0.3.0     2019-03-25 [1] CRAN (R 4.2.0)
   inline         0.3.19    2021-05-31 [1] CRAN (R 4.2.0)
   isoband        0.2.5     2021-07-13 [1] CRAN (R 4.2.0)
   jsonlite       1.8.0     2022-02-22 [1] CRAN (R 4.2.0)
   labeling       0.4.2     2020-10-20 [1] CRAN (R 4.2.0)
   lattice        0.20-45   2021-09-22 [2] CRAN (R 4.2.1)
   lifecycle      1.0.1     2021-09-24 [1] CRAN (R 4.2.0)
   loo            2.5.1     2022-03-24 [1] CRAN (R 4.2.0)
   magrittr       2.0.3     2022-03-30 [1] CRAN (R 4.2.0)
   MASS           7.3-57    2022-04-22 [2] CRAN (R 4.2.1)
   Matrix         1.4-1     2022-03-23 [2] CRAN (R 4.2.1)
   matrixStats    0.62.0    2022-04-19 [1] CRAN (R 4.2.0)
   mgcv           1.8-40    2022-03-29 [2] CRAN (R 4.2.1)
   munsell        0.5.0     2018-06-12 [1] CRAN (R 4.2.0)
   nlme           3.1-157   2022-03-25 [2] CRAN (R 4.2.1)
   pillar         1.7.0     2022-02-01 [1] CRAN (R 4.2.0)
   pkgbuild       1.3.1     2021-12-20 [1] CRAN (R 4.2.0)
   pkgconfig      2.0.3     2019-09-22 [1] CRAN (R 4.2.0)
   prettyunits    1.1.1     2020-01-24 [1] CRAN (R 4.2.0)
   processx       3.5.3     2022-03-25 [1] CRAN (R 4.2.0)
   ps             1.7.1     2022-06-18 [1] CRAN (R 4.2.0)
   R6             2.5.1     2021-08-19 [1] CRAN (R 4.2.0)
   RColorBrewer   1.1-3     2022-04-03 [1] CRAN (R 4.2.0)
   Rcpp           1.0.8.3   2022-03-17 [1] CRAN (R 4.2.0)
   RcppEigen      0.3.3.9.2 2022-04-08 [1] CRAN (R 4.2.0)
 D RcppParallel   5.1.5     2022-01-05 [1] CRAN (R 4.2.0)
   rlang          1.0.2     2022-03-04 [1] CRAN (R 4.2.0)
   rprojroot      2.0.3     2022-04-02 [1] CRAN (R 4.2.0)
   rstan        * 2.26.13   2022-06-25 [1] local
   scales         1.2.0     2022-04-13 [1] CRAN (R 4.2.0)
   StanHeaders  * 2.26.13   2022-06-25 [1] local
   tibble         3.1.7     2022-05-03 [1] CRAN (R 4.2.0)
   utf8           1.2.2     2021-07-24 [1] CRAN (R 4.2.0)
   V8             4.2.0     2022-05-14 [1] CRAN (R 4.2.0)
   vctrs          0.4.1     2022-04-13 [1] CRAN (R 4.2.0)
   viridisLite    0.4.0     2021-04-13 [1] CRAN (R 4.2.0)
   withr          2.5.0     2022-03-03 [1] CRAN (R 4.2.0)

 [1] C:/Users/nd1316/AppData/Local/R/win-library/4.2
 [2] C:/Program Files/R/R-4.2.1/library

 D ── DLL MD5 mismatch, broken installation.