Help with custom probability distribution: Multivariate Power Exponential


I am looking for assistance in coding a custom probability distribution. In particular, I am trying to use the multivariate generalization of the power exponential distribution defined in Gomez-Villegas and Marin (1998) - " A Multivariate generalization of the power exponential family of distributions." in Communications in Statistics. This is a generalization of Subbotin’s (1923) univariate power exponential. The density functions is f(x;\mu, \Sigma, \beta) = k|\Sigma|^{1/2}exp{(-1/2)[(x-\mu)^{'}\Sigma^{-1}(x-\mu)]^{\beta}} where, k = (n\Gamma(n/2)) / (\pi^{n/2}\Gamma(1 + (n/2\beta)2^{1+(n/2\beta)}). I include a snippet from the article describing this pdf for clarity. Specifically, I am looking for advice on how to code this in Stan with a cholesky parameterization for \Sigma for efficiency. (If possible?).

Hope this is all clear!


Hi, sorry for not getting to you earlier.

Just to be clear where the issue is: if you were to re-implement some simple univariate distribution in Stan without using the built-in distributions (say Gamma), would you know where to start? I.e. is the problem implementing distributions in general or this distribution in particular?

If I understand it correctly, the Cholesky parametrization of the normal distribution in Stan is used because you can use the Cholesky to cheaply find both the determinant and get the result of multiplication by inverse (via mdivide_left_tri_low and related functions), so that seems directly applicable to your case as well.

Best of luck with the model!

Hi @martinmodrak!

Thank you for your response! In regards to your first question, I’d say it is a little bit of both. I know Stan implements the log of probability/mass functions. I’ve gone through the User guide and some tutorials about writing custom PDFs (i.e. triangle distribution, exponential), but I’m struggling through scaling this up to a multivariate PDF (especially when accounting for the covariance structure and related mathematical functions). I DO specifically want to use the MPE described in the post - it is part of a family of multivariate generalizations of a Gaussian (MGGD) whose tails are higher or lower than the “normal” Gaussian - domain expertise suggests this to be the case. I have actually written a model with a Gaussian (see code below), and now I want to compare those results to those obtained from a generalization of the Gaussian presented here.

In relation to your second part, that is exactly what I’m looking to do - use the “cheap” trick to efficiently find the determinant and multiplication by inverse. In the attached code, I’ve used such parameterizations and I was curious if I need to do anything in my custom function to also use the Cholesky parameterization. Honestly, I think it just goes back to my confusion with coding the covariance structure efficiently and is exacerbated by the fact I am rather new Stan user and am still wrapping my head around syntax and the pre-defined mathematical functions in Stan (coming from an R user).

  //multivariate outcomes
  int N; // # of individuals
  int K; // # of responses
  vector[K] y[N]; // vector of responses per individual
  real X[N]; //age
  vector<lower=0>[K] a;
  vector<lower=0>[K] r;
  vector[K] b;
  vector<lower=0>[K] s_scale;
  vector<lower=0>[K] kappa;
  //cholesky factors for rho
  cholesky_factor_corr[K] L_Omega;
transformed parameters{
  vector[K] mu[N];
  vector[K] s[N];
  matrix[K,K] Sigma[N];
  for(i in 1:N){
    for(k in 1:K){
      mu[i,k] = a[k]*X[i]^(r[k])+b[k];
      s[i,k] = s_scale[k]*(1 + kappa[k]*X[i]);
  for(i in 1:N){
    Sigma[i] = diag_pre_multiply(s[i],L_Omega);  

  a ~ gamma(4,1);
	r ~ gamma(4,1);
	b ~ normal(0,10);
	kappa ~ gamma(4,1);
	s_scale ~ gamma(4,1);
  L_Omega ~ lkj_corr_cholesky(1);
	for(i in 1:N){
	 y[i] ~ multi_normal_cholesky(mu[i], Sigma[i]); 
generated quantities{
  vector[K] Y_mean[N];
  for(i in 1:N){
    for(k in 1:K){
      Y_mean[i,k] = a[k]*X[i]^r[k] +b[k];

I should that my (somewhat limited) experience with a bit more exotic univariate distributions is that those extra parameters tend to be pretty hard to fit and get very low improvements when you manage to fit them, so I would also definitely consider the multivariate student-t as this is already available.

Another trick that I got from @avehtari is that instead of using some exotic distribution you have a scale mixture of two distributions with the same mean (and here probably also the correlation matrix) but different deviation. Despite having more parameters (the mixture probability + second standard deviations), those also tend to be relatively easy to fit - at least in the univariate case - while allowing for somewhat fatter tails.

So the Cholesky trick is mostly an efficiency thing, so if you feel unsure about getting it right (I honestly am not 100% sure about all of this), I would start by writing the distribution without the Cholesky decompostion with explicit inverse and log_determinant calls and see if you can fit it. If yes, then we can optimize - with the added bonus that we have the less efficient parametrization to check that we get the same results after the optimization.

So if I read the math correctly, you would have something like (code not tested, please double check):

functions {
real mpe_lpdf(vector x, vector mu, matrix Sigma, real beta) {
  int n = num_elements(x);
  real n_half = n * 0.5;
  real one_p_n_half_beta = 1 + n_half / beta;
  real log_k = log(n) + lgamma(n_half) - n_half * log(pi()) 
     - lgamma(one_p_n_half_beta ) - one_p_n_half_beta * log(2);

  vector x_mu = x - mu;
  real product = x_mu' * inverse(Sigma) * x_mu;
  return log_k - 0.5 * log_determinant(Sigma) - 0.5 * (product ^ beta);

Hope that gets you started.