Monotonic with Threading Error

adding threading to monotonic model throws errors. Error msg below, added ellipses to reduce length.

Compiling Stan program...
In file included from /usr/include/c++/7/bits/move.h:54:0,
                 from /usr/include/c++/7/bits/nested_exception.h:40,
                 from /usr/include/c++/7/exception:143,
                 from /usr/include/c++/7/new:40,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:82,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpfzP4fs/model-196577b2c3e5.hpp:3:
/usr/include/c++/7/type_traits:120:12:   required from ‘struct std::__or_<std::is_object<std::_Tuple_impl<63, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, const std::vector<int, std::allocator<int> >&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, const std::vector<int, std::allocator<int> >&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, const std::vector<int, 
...
std::allocator<int> >&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, const Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, stan::math::var, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const
 ...
std::allocator<int> >&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, const std::vector<int, std::allocator<int> >&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>, const std::v/usr/include/c++/7/type_traits:637:12:   [ skipping 889 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
stan/lib/stan_math/stan/math/rev/functor/reduce_sum.hpp:109:41:   required from ‘void stan::math::internal::reduce_sum_impl<ReduceFunction, typename std::enable_if<stan::is_var<typename std::decay<_Arg>::type, void>::value, void>::type, ReturnType, Vec, Args ...>::recursive_reducer::operator()(const tbb::blocked_range<long unsigned int>&) [with ReduceFunction = file1965474d4f82_model_namespace::partial_log_lik_rsfunctor__; ReturnType = stan::math::var; Vec = const std::vector<int>&; Args = {const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>&, 
...
Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1>&, stan::math::var&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const Eigen::Matrix<double, -1, 1, 0, -1, 1>&, const/tmp/RtmpfzP4fs/model-196577b2c3e5.hpp:19252:1:   required from here
/usr/include/c++/7/type_traits:580:12: fatal error: template instantiation depth exceeds maximum of 900 (use -ftemplate-depth= to increase the maximum)
     struct is_reference
            ^~~~~~~~~~~~
compilation terminated.
make: *** [/tmp/RtmpfzP4fs/model-196577b2c3e5] Error 1
Error: An error occured during compilation! See the message above for more information.

As a guess, it looks like you haven’t specified backend = "cmdstanr". But we’ll need to see the full brms call (with some example/fake data if possible) to be able to replicate the error

oops sorry, here ya go:

brms_20_gamma_mdl4_6a <- brm(bf(y ~ mo(g_size) * mo(g_noise) * g_shape * g_interps + (1|g_rep),
                                   shape ~ mo(g_size) * mo(g_noise) * g_interps), #silent = FALSE,
                                prior =c(
                                  # y ~ Intercept + b
                                  prior(normal(3, 1),class="Intercept"),
                                  prior(normal(0, 3),class="b"),
                                  # v ~ Intercept + b
                                  prior(normal(8, 1),class="Intercept",dpar="shape"),
                                  prior(normal(0, 1),class="b",dpar="shape"),
                                  # y ~ sd
                                  prior(normal(0, 0.02),class='sd')),
                                family = brmsfamily("Gamma",link = "log",link_shape = "log"),
                                sample_prior = "only",
                                seed = seed3,
                                data = t_long_CNSubset20red1_sub3_ord, warmup = 500, iter = 1000, chains = 4, core = 4,
                                threads = 2,#threading(8, grainsize = 45),#8
                                backend = "cmdstanr",
                                control = list(adapt_delta = 0.80, max_treedepth = 12,refresh = 10))
    save(brms_20_gamma_mdl4_6a, file = paste(wkDir,str0, "_",mStr,"_brms_20_gamma_mdl4_6a", ordStr, ".RData", sep=""))
   

see here for data and problem details: https://discourse.mc-stan.org/t/problems-converging-using-custom-gamma2-distribution/14684/36

Is it different if you change the threads argument to threads = threading(2)?

Just did a bit of digging, this looks like a legitimate bug.

@wds15 Got some brms threading code here that throws a template instantiation depth error when compiling.

To reproduce:

t_long_CNSubset20red1_sub3_ord = read.csv("https://discourse.mc-stan.org/uploads/short-url/byihhT2MGE2yXgTZB1pg93wPxio.csv")

t = brm(bf(y ~ mo(g_size) * mo(g_noise) * g_shape * g_interps),
        data = t_long_CNSubset20red1_sub3_ord, chains = 2,
        threads = threading(2), backend = "cmdstanr")

I think it might legitimately be a template depth error since it compiles fine if you remove one of the interactions, but that’s just a guess.

1 Like

also believe there is some memory leakage going on somewhere. Every so often I have to do a manual gc() and restart the R session. Was first made evident when doing bench testing in a loop, but then again after running several models.

Ok, this indeed throws an error… but the generated function is totally crazy:

  real partial_log_lik(int[] seq, int start, int end, vector Y, matrix Xc, vector b, real Intercept, vector Csp_1, vector Csp_2, vector Csp_3, vector Csp_4, vector Csp_5, vector Csp_6, vector Csp_7, vector Csp_8, vector Csp_9, vector Csp_10, vector Csp_11, vector Csp_12, vector Csp_13, vector Csp_14, vector Csp_15, vector Csp_16, vector Csp_17, vector Csp_18, vector Csp_19, vector Csp_20, vector Csp_21, vector Csp_22, vector Csp_23, vector Csp_24, vector Csp_25, vector Csp_26, vector Csp_27, vector Csp_28, vector Csp_29, vector Csp_30, vector Csp_31, vector Csp_32, vector Csp_33, vector Csp_34, vector Csp_35, vector Csp_36, vector Csp_37, vector Csp_38, vector Csp_39, vector Csp_40, vector Csp_41, vector Csp_42, vector Csp_43, vector Csp_44, vector Csp_45, vector Csp_46, vector Csp_47, vector Csp_48, vector Csp_49, vector Csp_50, vector Csp_51, vector bsp, int[] Xmo_1, vector simo_1, int[] Xmo_2, vector simo_2, int[] Xmo_3, vector simo_3, int[] Xmo_4, vector simo_4, int[] Xmo_5, vector simo_5, int[] Xmo_6, vector simo_6, int[] Xmo_7, vector simo_7, int[] Xmo_8, vector simo_8, int[] Xmo_9, vector simo_9, int[] Xmo_10, vector simo_10, int[] Xmo_11, vector simo_11, int[] Xmo_12, vector simo_12, int[] Xmo_13, vector simo_13, int[] Xmo_14, vector simo_14, int[] Xmo_15, vector simo_15, int[] Xmo_16, vector simo_16, int[] Xmo_17, vector simo_17, int[] Xmo_18, vector simo_18, int[] Xmo_19, vector simo_19, int[] Xmo_20, vector simo_20, int[] Xmo_21, vector simo_21, int[] Xmo_22, vector simo_22, int[] Xmo_23, vector simo_23, int[] Xmo_24, vector simo_24, int[] Xmo_25, vector simo_25, int[] Xmo_26, vector simo_26, int[] Xmo_27, vector simo_27, int[] Xmo_28, vector simo_28, int[] Xmo_29, vector simo_29, int[] Xmo_30, vector simo_30, int[] Xmo_31, vector simo_31, int[] Xmo_32, vector simo_32, int[] Xmo_33, vector simo_33, int[] Xmo_34, vector simo_34, int[] Xmo_35, vector simo_35, int[] Xmo_36, vector simo_36, int[] Xmo_37, vector simo_37, int[] Xmo_38, vector simo_38, int[] Xmo_39, vector simo_39, int[] Xmo_40, vector simo_40, int[] Xmo_41, vector simo_41, int[] Xmo_42, vector simo_42, int[] Xmo_43, vector simo_43, int[] Xmo_44, vector simo_44, int[] Xmo_45, vector simo_45, int[] Xmo_46, vector simo_46, int[] Xmo_47, vector simo_47, int[] Xmo_48, vector simo_48, int[] Xmo_49, vector simo_49, int[] Xmo_50, vector simo_50, int[] Xmo_51, vector simo_51, int[] Xmo_52, vector simo_52, int[] Xmo_53, vector simo_53, int[] Xmo_54, vector simo_54, int[] Xmo_55, vector simo_55, int[] Xmo_56, vector simo_56, int[] Xmo_57, vector simo_57, int[] Xmo_58, vector simo_58, int[] Xmo_59, vector simo_59, int[] Xmo_60, vector simo_60, int[] Xmo_61, vector simo_61, int[] Xmo_62, vector simo_62, int[] Xmo_63, vector simo_63, int[] Xmo_64, vector simo_64, int[] Xmo_65, vector simo_65, int[] Xmo_66, vector simo_66, int[] Xmo_67, vector simo_67, int[] Xmo_68, vector simo_68, int[] Xmo_69, vector simo_69, int[] Xmo_70, vector simo_70, int[] Xmo_71, vector simo_71, int[] Xmo_72, vector simo_72, real sigma) {
...
}

Not sure what is the maximal amount of arguments supported by stanc3… @rok_cesnovar ? This could probably trigger a compiler issue rather than a stan-math thing. If we should track that, then it should be filed as an issue.

Sure? R is terrible with memory cleaning. What happens if you run the same benchmark, but without threading turned on?

This is not a stanc3 error. reduce_sum/variadic ode works with any number of arguments. It was a good test both stanc3 and math-side works with 200+ args though :) So great job us! :)

As @andrjohns correctly guessed it’s a template depth thing. Increasing the limit fixes the issue. For me the error I saw was

stan/lib/stan_math/lib/tbb_2019_U8/include/tbb/parallel_reduce.h:274:11:   required from here
/usr/include/c++/9/type_traits:579:12: fatal error: template instantiation depth exceeds maximum of 900 (use ‘-ftemplate-depth=’ to increase the maximum)
  579 |     struct is_reference
      |            ^~~~~~~~~~~~
compilation terminated.

@samantha.zambo To solve this, you can run

cmdstanr::cmdstan_make_local(cpp_options = "CXXFLAGS += -ftemplate-depth=2048", append = TRUE)
rebuild_cmdstan() # probably not needed but just in case

and then try again. The model does compile for soooome time.

5 Likes