Error when implementing custom distribution - Undefined reference to empty_broadcast_array


I have been working on learning how to implement my own distribution that uses some optimised c++ code I already have. I have worked through the process of implementing a copy of the normal distribution successfully and have looked at the multi normal cholesky for some guidance on other things. I have gotten to the point where I can call my optimised c++ function (which takes pointers to doubles and returns the log likelihood and gradients via pointers as well). However, when I try to assign the computed gradients to ops_partials.edge1_.partials_ I get an error about undefined reference to empty broadcast array. If I don’t set anything to ops_partials then it compiles fine, so i think its something wrong with how I am setting those values.

Here is the code for my new custom distribution, I have actually commented out the bits that would


#include <stan/math/prim/meta.hpp>
#include <stan/math/prim/err.hpp>
#include <stan/math/prim/fun/as_value_column_vector_or_scalar.hpp>
#include <stan/math/prim/fun/constants.hpp>
#include <stan/math/prim/fun/dot_self.hpp>
#include <stan/math/prim/fun/log.hpp>
#include <stan/math/prim/fun/max_size_mvt.hpp>
#include <stan/math/prim/fun/mdivide_left_tri.hpp>
#include <stan/math/prim/fun/mdivide_right_tri.hpp>
#include <stan/math/prim/fun/size_mvt.hpp>
#include <stan/math/prim/fun/sum.hpp>
#include <stan/math/prim/fun/transpose.hpp>
#include <stan/math/prim/fun/vector_seq_view.hpp>
#include <stan/math/prim/functor/operands_and_partials.hpp>

#include "ll_dll_normal.hpp"

namespace stan {
namespace math {

/** \ingroup multivar_dists
 * The log of the matrix normal density for the given y, mu, and LSigma
 * where L is the lower cholesky factor of a covariance matrix such that LSigma*LSigma' = Sigma
 * @tparam T_y type of scalar
 * @tparam T_mu type of location
 * @tparam T_LSigma type of LSigma
 * @param y An mxn matrix.
 * @param mu The mean matrix.
 * @param LSigma The nxn the lower cholesky factor of the covariance matrix Sigma
 * @return The log of the matrix normal density.
 * @throw std::domain_error if Sigma is not square
template <bool propto, typename T_y, typename T_mu, typename T_LSigma,
          require_all_matrix_t<T_y, T_mu, T_LSigma >* = nullptr>
return_type_t<T_y, T_mu, T_LSigma> matrix_normal_cholesky_lpdf(
    const T_y& y, const T_mu& mu, const T_LSigma& LSigma) {
    static const char* function = "matrix_normal_cholesky_lpdf";
    check_finite(function, "LSigma", LSigma);
    check_size_match(function, "Rows of random variable", y.rows(),
                   "Rows of location parameter", mu.rows());
    check_size_match(function, "Columns of random variable", y.cols(),
                   "Columns of location parameter", mu.cols());
    check_size_match(function, "Rows of random variable", y.rows(),
                   "Rows of LSigma", LSigma.rows());
    check_finite(function, "Location parameter", mu);
    check_finite(function, "Random variable", y);

    using T_covar_elem = typename scalar_type<T_LSigma>::type;
    using T_return = return_type_t<T_y, T_mu, T_LSigma>;
    using T_partials_return = partials_return_t<T_y, T_mu, T_LSigma>;
    using matrix_partials_t
      = Eigen::Matrix<T_partials_return, Eigen::Dynamic, Eigen::Dynamic>;
    using T_y_ref = ref_type_t<T_y>;
    using T_mu_ref = ref_type_t<T_mu>;
    using T_L_ref = ref_type_t<T_LSigma>;

    T_y_ref y_ref = y;
    T_mu_ref mu_ref = mu;
    T_L_ref L_ref = LSigma;

    decltype(auto) y_val = to_ref(value_of(y_ref));
    decltype(auto) mu_val = to_ref(value_of(mu_ref));
    decltype(auto) L_val = to_ref(value_of(L_ref));

    // prepare data for external call
    int p = y.rows();
    int n = y.cols();
    double *mu_data = const_cast<double*>(;
    double *y_data = const_cast<double*>(;
    double *L_data = const_cast<double*>(;

    // get only the triangular bits of L_data that we want
    int ntheta = p*(p+1)/2;
    double * theta = (double *)malloc(ntheta*sizeof(double));
    int count = 0;
    for (size_t i = 0; i < p; i++){
        for (size_t j = 0; j < i+1; j++){
            theta[count] = L_data[i,j];
    double * dLL = (double *)malloc(p*n+p*n+ntheta*sizeof(double));
    double LL = 0;

    blas_maths::ll_and_dll(p, n, y_data, mu_data, theta, &LL, dLL);

	int ione  = 1;
	int itmp = n*p;

    matrix_partials_t dLLdy(p, n);
//    dLLdy = matrix_partials_t::Zero(p, n);

    matrix_partials_t dLLdmu(p, n);
    dLLdmu = matrix_partials_t::Zero(p, n);
//    dcopy_(&itmp,dLL+n*p,&ione,,&ione);

    matrix_partials_t dLLdLSigma(p, p);
     dLLdLSigma = matrix_partials_t::Zero(p, p);    // zero this as we will only then set lower triangle
    count = 0;
    for (size_t i = 0; i < p; i++){
        for (size_t j = 0; j < i+1; j++){
            dLLdLSigma[i, j] = dLL[n*p+n*p+count];

    operands_and_partials<T_y_ref, T_mu_ref, T_L_ref> ops_partials(y_ref, mu_ref, L_ref);
    T_partials_return logp(0);

  matrix_partials_t inv_L_val = mdivide_left_tri<Eigen::Lower>(value_of(L_ref));

    logp = LL;
    ops_partials.edge1_.partials_ = dLLdy;
    ops_partials.edge2_.partials_ = dLLdmu;
    ops_partials.edge3_.partials_ = dLLdLSigma;


template <typename T_y, typename T_mu, typename T_LSigma,
          require_all_matrix_t<T_y, T_mu, T_LSigma>* = nullptr>
return_type_t<T_y, T_mu, T_LSigma> matrix_normal_cholesky_lpdf(
    const T_y& y, const T_mu& mu, const T_LSigma& LSigma) {
  return matrix_normal_cholesky_lpdf<false>(y, mu, LSigma);

}  // namespace math
}  // namespace stan

The error I am getting is

/usr/bin/ld: ../models/lpv_v01_MNC.o: in function `stan::return_type<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >::type stan::math::matrix_normal_cholesky_lpdf<false, Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, (void*)0>(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&)':
lpv_v01_MNC.hpp:(.text._ZN4stan4math27matrix_normal_cholesky_lpdfILb0EN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEES4_S4_LPv0EEENS_11return_typeIJT0_T1_T2_EE4typeERKS7_RKS8_RKS9_[_ZN4stan4math27matrix_normal_cholesky_lpdfILb0EN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEES4_S4_LPv0EEENS_11return_typeIJT0_T1_T2_EE4typeERKS7_RKS8_RKS9_]+0x868): undefined reference to `stan::math::internal::empty_broadcast_array<double, Eigen::Matrix<double, -1, -1, 0, -1, -1>, void>::operator+=(Eigen::Matrix<double, -1, -1, 0, -1, -1>)'
/usr/bin/ld: ../models/lpv_v01_MNC.o: in function `stan::return_type<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >::type stan::math::matrix_normal_cholesky_lpdf<true, Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1>, (void*)0>(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1> const&)':
lpv_v01_MNC.hpp:(.text._ZN4stan4math27matrix_normal_cholesky_lpdfILb1EN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEES4_S4_LPv0EEENS_11return_typeIJT0_T1_T2_EE4typeERKS7_RKS8_RKS9_[_ZN4stan4math27matrix_normal_cholesky_lpdfILb1EN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEES4_S4_LPv0EEENS_11return_typeIJT0_T1_T2_EE4typeERKS7_RKS8_RKS9_]+0x868): undefined reference to `stan::math::internal::empty_broadcast_array<double, Eigen::Matrix<double, -1, -1, 0, -1, -1>, void>::operator+=(Eigen::Matrix<double, -1, -1, 0, -1, -1>)'
collect2: error: ld returned 1 exit status

Here is the external c++ code in case that helps


#include <string.h>
#include "math.h"

namespace blas_maths {

void ll_and_dll(int p,
                int N,
                double *y,
                double *yh,
                double *theta,
                double *LL,
                double *dLL)
	double *pe, *Pi, *R, *S, dtmp;
	int k, i, j, itmp;
    // Extract Gamma and Pi from theta
    Pi    = theta;

	int ione  = 1;
	int izero = 0;
	double done  =  1.0;
	double dzero =  0.0;
	double dmone = -1.0;
    char LOW   = 'L';
    char UPP   = 'U';
    char TRN   = 'T';
    char NO    = 'N';
    char LEFT  = 'L';
    char RIGHT = 'R';
    //Create memory and zero it
    pe   = (double *)malloc(p*N*sizeof(double));
    R    = (double *)malloc(p*p*sizeof(double));
    S    = (double *)malloc(p*p*sizeof(double));
    dtmp = 0.0;
    itmp = p*p;
	/* Clear R and set it up */
    k = 0;
            R[j + i*p]  = Pi[k];
    //Form error:  pe=y-yh
    itmp = p*N;
    dtmp = 0.0;
    //Scale error by inverse transpose Cholesky factor:  pe = R' \ pe

    //Form S = (R\pe)*(R\pe)'
    dsyrk_(&UPP, &NO, &p, &N, &done, pe, &p, &dzero, S, &p);
			S[i+j*p] = S[j+i*p];

    //Compute LL
    LL[0] = 0.0;
        LL[0] = LL[0] - (0.5*N)*log(R[i+i*p]*R[i+i*p]) - 0.5*S[i+i*p];
    //Compute gradient
    double *dLLdy, *dLLdyh, *dLLdR;
    dLLdy  = dLL;
    i      = p*N;
    dLLdyh = &dLL[i];
    i      = i+p*N;
    dLLdR = &dLL[i];

    //pe = R \ (R' \ (y-yh))
    itmp = p*N;

    // 2*vech(diag(Z.Ny./diag(R)) - (S/R')
        S[i+i*p] = -N/R[i+i*p] + S[i+i*p];
            dLLdR[k] = S[j+i*p];



//void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
//	double *y, *yh, *theta, *LL, *dLL, *yout;
//	int p, N, ntheta;
//    // n, m, p, na, nb, N, smoothing, y, u, a, b, theta, P1, X1
//    y         = mxGetPr(prhs[0]);
//    p         = mxGetM(prhs[0]);
//    N         = mxGetN(prhs[0]);
//    yh        = mxGetPr(prhs[1]);
//    theta     = mxGetPr(prhs[2]);
//    ntheta    = mxGetM(prhs[2]);
//    plhs[0]   = mxCreateDoubleMatrix(1,1,mxREAL);
//    LL        = mxGetPr(plhs[0]);
//    plhs[1]   = mxCreateDoubleMatrix(p*N+p*N+ntheta,1,mxREAL);
//    dLL       = mxGetPr(plhs[1]);
//    ll_and_dll(p,N,y,yh,theta,LL,dLL);

Thanks in advance for any help. I am feeling very stuck on this. Also, if there is any advice on better ways to access the data as point to double and to set it back into eigen matrices then that would be great.

How are you including/compiling this? I’m able to compile without issue as a standalone file.

As for moving between pointers and eigen matrices, try out the Map functionality, which allows you to map a segment in memory and treat it as an Eigen object: Eigen: Interfacing with raw buffers: the Map class


Sounds like there must be something else a little less obvious I am doing wrong :S

I included matrix_normal_cholesky.hpp into prob.hpp.

Then in stanc3 I added the signature for the distribution to /middle/ as

; ([Lpdf], "matrix_normal_cholesky", [DMatrix; DMatrix; DMatrix], AoS)

and then recompiled stanc3

The new distribution is used in the model file lpv_v01.stan given below

data {
    int<lower=0> N;
    row_vector[N] u;
    row_vector[N] y;
    real x0;
    real<lower=1e-8> P0_diag;
    matrix[2, N] p;
    matrix[2, N] q;
//    row_vector[2] C;

parameters {
    row_vector[N+1] x;
    row_vector[2] A;
    row_vector[2] B;
    row_vector[2] C;
    row_vector[2] D;
    real<lower=1e-8> sigma_v;
    real<lower=1e-8> sigma_w;
    cholesky_factor_cov[2] LPI;

transformed parameters {
    row_vector[N] mu;
    row_vector[N] mu_y;
    matrix[2, N] mu_c;
    // TODO: Actually implement a kronecker product in stan
    mu = (A * p) .* x[1:N] + (B * q) .* u;
    mu_y = (C * p) .* x[1:N] + (D * q) .* u;
    // combine
    mu_c[1, :] = mu;
    mu_c[2, :] = mu_y;

model {

    // initial state prior
    x[1] ~ normal(x0, P0_diag);      //

    // combined distribution
    append_row(x[2:N+1],y) ~ matrix_normal_cholesky(mu_c, LPI);

generated quantities {
    cov_matrix[2] PI;
    row_vector[N] yhat;
    PI = LPI * LPI';
    yhat = mu_y + normal_rng(0, sqrt(PI[2,2]));

//    loglikelihood = matrix_normal_lpdf(append_row(h,y) | mu_c, diag_pre_multiply(tau,Lcorr));


I create the hpp file from this using my compiled stanc3

./_build/default/src/stanc/stanc.exe lpv_v01.stan 

and compile it using cmdstan

make lpv_v01

And some python code I used to simulate some data for this model is

import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg as linalg
import json
import pandas as pd

N = 100    # Number of data samples

nx = 1      # state dim
nu = 1      # input dim
ny = 1      # output dim
nv = 2      # time varying signal size??

 # true system
A1 = 0.9
B1 = 1
C1 = 0.5
D1 = 0

A2 = 0.6
B2 = 1
C2 = 0.7
D2 = 0

A = np.hstack([A1, A2])
B = np.hstack([B1, B2])
C = np.hstack([C1, C2])
D = np.hstack([D1, D2])
# Q = np.eye(nx) / 10
# R = np.eye(ny) / 10
Q = np.eye(nx) / 10
R = np.eye(ny) / 10

# exo signals
u = np.random.randn(nu,N)
p = (np.sin(2*np.pi*np.linspace(0,3,N))+1)/2
p = np.vstack([p, 1-p])
q = p

w = linalg.sqrtm(Q)*np.random.randn(nx,N)
v = linalg.sqrtm(R)*np.random.randn(ny,N)

x = np.zeros((nx, N+1))
y = np.zeros((ny, N))
y_test = np.zeros((ny, N))

for k in range(N):
    x[:, k+1] = A @ np.kron(p[:, k], x[:, k]) + B @ np.kron(q[:, k], u[:, k]) + w[:, k]
    y[:, k] = C @ np.kron(p[:, k], x[:, k]) + D @ np.kron(q[:, k], u[:, k]) + v[:, k]

PI = np.array([[Q[0,0], 0],[0, R[0,0]]])
# uncomment below stuff to include initialisation
stan_init = {
    # 'sigma_v':np.sqrt(R[0,0]),
    # 'sigma_w':np.sqrt(Q[0,0]),
    'LPI': linalg.cholesky(PI).tolist()

with open('models/lpv_init.json', 'w') as outfile:
    json.dump(stan_init, outfile)

# save data for inference in stan
stan_data = {
    'N': int(N),
    'u': u.flatten().tolist(),
    'y': y.flatten().tolist(),
    'x0': 0.0,

with open('models/', 'w') as outfile:
    json.dump(stan_data, outfile)

Ah, figured it out. Alright, you’ve got two issues here (both easy fixes!).

Firstly, on Line 110 of your lpdf, you’re indexing a matrix with the bracket operator:

dLLdLSigma[i, j] = dLL[n*p+n*p+count];

Eigen matrices only use the bracket operator for single-index access with vectors, so you just need to change those to parentheses:

dLLdLSigma(i, j) = dLL[n*p+n*p+count];

As for the broadcast_array errors, those are being caused because you’re accumulating gradients for primitive inputs (i.e., doubles) which don’t need them:

    ops_partials.edge1_.partials_ = dLLdy;
    ops_partials.edge2_.partials_ = dLLdmu;
    ops_partials.edge3_.partials_ = dLLdLSigma;

This is easily fixed by just wrapping the assignment in an if statement that only accumulates if the input is an autodiff type:

    if (!is_constant_all<T_y>::value) {
      ops_partials.edge1_.partials_ = dLLdy;
    if (!is_constant_all<T_mu>::value) {
      ops_partials.edge2_.partials_ = dLLdmu;
    if (!is_constant_all<T_LSigma>::value) {
      ops_partials.edge3_.partials_ = dLLdLSigma;
1 Like

Also, I don’t seem to be able to get your lpdf to run.

I created a file testing.cpp with the contents:

#include <stan/math.hpp>
#include <stan/math/prim/prob/matrix_normal_cholesky_lpdf.hpp>
#include <iostream>

int main() {
  using Eigen::MatrixXd;

  MatrixXd y = MatrixXd::Ones(3, 5);
  MatrixXd mu = MatrixXd::Zero(3, 5);
  MatrixXd Sigma = MatrixXd::Identity(3, 3);

  double a = stan::math::matrix_normal_cholesky_lpdf(y, mu, Sigma);
  std::cout << a << std::endl;

  return 0;

And compiled with:

make -f make/standalone testing

But it errors when running:

andrew@HomeLinux:~/math$ ./testing
double free or corruption (out)
Aborted (core dumped)

After some trial-and-error, the culprit appears to start from the dcopy call on L89 of ll_dll_normal:


But I’m not familiar enough with BLAS to know what the error here is. Hope that helps!


Thanks so much for your answers. I have got it working now! The ‘double free or corruption’ error is fixed by using ‘new’ and ‘delete’ rather than ‘malloc’ and ‘free’ inside the matrix_normal_cholesky function. Probably should change this inside the ll_and_dll_normal function as well, however it does not seem necessary.

On a quick test I ran (n=1000, p=2) I get already get a decent improvement compared to adding the custom function within the stan model. With the version using my new distribution taking only 75% of the time per 10 leapfrog steps. So I am quite happy about this result.

I realize that my implementation could definitely not be added to stan core due to its reliance on blas (though maybe it could with some changes as eigen does use blas). But maybe something like it could in the future.

For reference, the custom function can be added to stan as

    real matrix_normal_lpdf(matrix y, matrix mu, matrix LSigma){
        int pdims[2] = dims(y);
        matrix[pdims[1],pdims[2]] error_sc = mdivide_left_tri_low(LSigma, y - mu);
        real p1 = -0.5*pdims[2]*(pdims[1]*log(2*pi()) + 2*sum(log(diagonal(LSigma))));
        real p2 = -0.5*sum(error_sc .* error_sc);
        return p1+p2;


1 Like

So it was all working really well. Happily compiling and running tests and then something changed. Now I have the issue that when I try to compile I get the following issue

-o ../models/lpv_v01_MNC
/usr/bin/ld: src/cmdstan/main.o: in function `blas_maths::ll_and_dll(int, int, double*, double*, double*, double*, double*)':
main.cpp:(.text+0x38f0): multiple definition of `blas_maths::ll_and_dll(int, int, double*, double*, double*, double*, double*)'; ../models/lpv_v01_MNC.o:lpv_v01_MNC.hpp:(.text+0x810): first defined here
collect2: error: ld returned 1 exit status
make: *** [make/program:59: ../models/lpv_v01_MNC] Error 1

for some reason it is defining the ll_and_dll function inside main.o and the model file .o

the only place the file for this extenal function is included is within the new distribution file (matrix_normal_cholesky.hpp)

so this was a rather dumb mistake by me. Fixed by declaring the function as inline. However, this does seem to have made the code run a lot slower…

I have actually narrowed down the performance issue to something entirely unrelated in the stan model file. So I am going to move that to a separate discussion.