Segfault in log_sum_exp function with rstan v2.19.2 for a rank-ordered logit model

Sorry for cross-posting, but my company would be really grateful if someone could help us with this issue that prevents us from upgrading to rstan v2.19.2: https://github.com/stan-dev/rstan/issues/689

We’re happy to make a donation to the Stan Project if someone can help us fix this.

This is a weird one! I think it’s caused by your indexing passing an empty vector to log_sum_exp which causes a segfault when Eigen tries to find the maximum coefficient (a known bug).

If we (temporarily) replace the Eigen code, we can investigate this a little better.

Firstly, find the log_sum_exp header file using:

system.file("include", "stan", "math", "prim", "mat", "fun", 
            "log_sum_exp.hpp", 
            package = "StanHeaders")

When you open that up, on line 27 you’ll see:

const double max = x.maxCoeff();

Replace that with:

  double max = -std::numeric_limits<double>::infinity();
  for (int i = 0; i < x.size(); ++i)
    if (x(i) > max)
      max = x(i);

Now the program will run without segfaulting, and we can investigate the issue a bit better.

First, add a print statement prior to the log_sum_exp call, like below:

print((m + 1));
log_phi_dot_omega = log_sum_exp(others[log_sum_exp(others[combinations[i, 2:(m + 1)]])]);

When you start sampling, you’ll see values like below:

3
3
4
1
2
2

You’ll notice that some of the printed values are 1, which means that your indexing is evaluating to combinations[i, 2:1] which returns an empty vector.

You can see this in action if you change the print statement to:

print(others[combinations[i, 2:(m + 1)]]);

In the printed output while the program is running, the printed output in the console will then look like the below:

[]
[3.8259]
[0.106469]
[3.8259,0.106469]
[-1.35698]

You can see the [], which indicates an empty vector is being passed

So if you fix the indexing, you should no longer see this segfault. The reason that it works in older versions of stan is because the maxCoeff call was introduced in 2.19.

PS. Don’t forget to revert the changes to log_sum_exp.hpp!!

3 Likes

If you try to compile this model with the latest cmdstan i(t was an older version of cmdstan with a non-fresh version of stanc3) it wont even compile with:

/home/rok/Desktop/rol/rol.hpp:492:12: error: could not convert ‘combinations’ from ‘vector<vector<int>>’ to ‘vector<vector<double>>’
     return combinations;
            ^~~~~~~~~~~~
/home/rok/Desktop/rol/rol.hpp: In instantiation of ‘std::vector<typename boost::math::tools::promote_args<T>::type> rol_model_namespace::next_combination(const std::vector<T>&, std::ostream*) [with T0__ = int; typename boost::math::tools::promote_args<T>::type = double; std::ostream = std::basic_ostream<char>]’:
/home/rok/Desktop/rol/rol.hpp:486:54:   required from ‘std::vector<std::vector<typename boost::math::tools::promote_args<T>::type> > rol_model_namespace::generate_combinations(const T0__&, std::ostream*) [with T0__ = int; typename boost::math::tools::promote_args<T>::type = double; std::ostream = std::basic_ostream<char>]’
/home/rok/Desktop/rol/rol.hpp:641:86:   required from here
/home/rok/Desktop/rol/rol.hpp:397:53: warning: overflow in conversion from ‘double’ to ‘int’ changes value from ‘+QNaN’ to ‘0’ [-Woverflow]
     n_items = std::numeric_limits<double>::quiet_NaN();
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/rok/Desktop/rol/rol.hpp:414:12: error: could not convert ‘next_comb’ from ‘vector<int>’ to ‘vector<double>’
     return next_comb;
            ^~~~~~~~~

But obviously this works for older versions. Any idea @andrjohns?

1 Like

What if you add size-zero checking to log_sum_exp? Like:

template <int R, int C>
double log_sum_exp(const Eigen::Matrix<double, R, C>& x) {
  if (size_zero(x))
    return 0;
  const double max = x.maxCoeff();
  return max + std::log((x.array() - max).exp().sum());
}
1 Like

Or should that be check_nonempty rather than size_zero? Depends on whether it should silently fail or throw, I guess

I spoke too quickly. Ignore my above post, that was using an older version of stanc3, not a fresh develop.

With cmdstan 2.21 it compiles and throws out when I start sampling.

rol: stan/lib/stan_math/lib/eigen_3.3.3/Eigen/src/Core/Redux.h:413: typename Eigen::internal::traits<T>::Scalar Eigen::DenseBase<Derived>::redux(const Func&) const [with BinaryOp = Eigen::internal::scalar_max_op<double, double>; Derived = Eigen::Matrix<double, -1, 1>; typename Eigen::internal::traits<T>::Scalar = double]: Assertion this->rows()>0 && this->cols()>0 && “you are using an empty matrix”’ failed.`

I added that plus

template <int R, int C>
inline var log_sum_exp(const Eigen::Matrix<var, R, C>& x) {
  if (size_zero(x))
    return var(0.0);
  return var(new internal::log_sum_exp_matrix_vari(x));
}

to the rev/log_sum_exp (also needed to include the headers but that is it) and this now samples. Great idea @andrjohns ! I guess returning var(0.0) is fine in this case?

I have no experience with Stan Headers but I imagine @mwmclean can modify the StanHeaders with this fix?

The other option would be to use cmdstanr with the following script and cmdstan with the fix:

    library("cmdstanr")
    set_cmdstan_path("~/cmdstan/") 
    my_stan_program <- file.path("./rol.stan")
    mod <- cmdstan_model(stan_file = my_stan_program)
    load(file.path("./mdSegfault.RData"))
    mod$compile()
    fit <- mod$sample(data = stan.dat)
    fit$summary()

I forgot about the rev header, good catch!

@mwmclean if the indexing is intentional (or just working as intended), you can make the following changes to your StanHeaders and the program will run:

First, find the log_sum_exp header file that handles primitive types (double):

system.file("include", "stan", "math", "prim", "mat", "fun", 
            "log_sum_exp.hpp", 
            package = "StanHeaders")

Below line 26, add the following:

  if (size_zero(x))
    return 0;

The function should now look like:

template <int R, int C>
double log_sum_exp(const Eigen::Matrix<double, R, C>& x) {
  if (size_zero(x))
    return 0;
  const double max = x.maxCoeff();
  return max + std::log((x.array() - max).exp().sum());
}

At the top of the file, add the following with the other #include statements:

#include <stan/math/prim/scal/fun/size_zero.hpp>

Next, find the log_sum_exp header file that handles autodiff variables:

system.file("include", "stan", "math", "rev", "mat", "fun", 
            "log_sum_exp.hpp", 
            package = "StanHeaders")

At the bottom of the file, below line 53, add:

  if (size_zero(x))
    return 0;

The function should now look like:

template <int R, int C>
inline var log_sum_exp(const Eigen::Matrix<var, R, C>& x) {
  if (size_zero(x))
    return var(0.0);
  return var(new internal::log_sum_exp_matrix_vari(x));
}

Next, add the #include <stan/math/prim/scal/fun/size_zero.hpp> statement add the top of the file

et voila! (Hopefully) working!

1 Like

Thanks @rok_cesnovar and @andrjohns for all the effort in resolving this. A few observations:
it should IMHO be possible to work around the issue completely at the Stan side, without handling C++. I imagine something along the lines of:

  if(m  >= 1) { //make sure the sum is non empty
      log_phi_dot_omega = log_sum_exp(others[combinations[i, 2:(m + 1)]]);
      sum_combinations += (-1) ^ m / (1 + exp(log_phi_dot_omega - xb[y[2]]));
   } else {
      sum_combinations += (-1) ^ m;  //because exp(log(0) - whatever) is 0
   }

(code not tested, please also double check my logic)

Also I believe that log_sum_exp should return negative infinity (and zero gradient) when summing an empty array as sum of zero elements is 0 and log(0) = -Inf.

Does that make sense to you?

4 Likes

Thanks for the quick replies, guys! We just sent along a donation. I agree with @martinmodrak that we should have been handling this m = 0 case in our stan code.

Will try initializing sum_combinations to be 1 and starting the for loop at 2.

4 Likes