I’ve long wished for the ability to model multivariate data in a manner that flexibly accommodates heavy-tail-edness to varying degrees per dimension, contrasting with the multivariate student-t whose df
parameter (nu
in the Stan spec) is common to all dimensions. The solution was given to me long ago but I only recently actually came to understand it and validate a full implementation, so I thought I’d share more widely. It’s not technically speaking a “multivariate student-t” anymore, but instead a multivariate with marginals that are student-t, hence the thread title. And as shown below, the extra flexibility comes at the computational cost of the introduction of one auxiliary parameter per observation per dimension of the multivariate.
Here’s the hierarchical case:
data{
int N ; // number of multivariate samples
int K ; // number of dimensions of the multivariate
int M ; // number of total observations
vector[M] Y ; // observation vector
array<lower=1,upper=N>[M] N_for_Y ; // index of which level of N is associated with each observation in Y
array<lower=1,upper=K>[M] K_for_Y ; // index of which level of M is associated with each observation in Y
}
transformed data{
// derive an index in to_vector(P) for each observation in Y
array[M] int Y_index_in_flattened_P ;
for(i in 1:M){
Y_index_in_flattened_P[i] = (
(N_for_Y[i]-1)
* K
+ K_for_Y[i]
) ;
}
}
parameters{
// standard mvn parameters
vector[K] mu ;
vector<lower=0>[K] sigma ;
cholesky_factor_corr[K] cholcor ;
matrix[K,N] P_stdnorms ;
// parameters for marginal-student-t
vector<lower=0,upper=1>[K] tau ; // nu = 2+28*tau
matrix<lower=0>[K,N] nu_chi ;
// observation-level noise
real<lower=0> noise ;
}
transformed parameters{
vector[K] nu = 2+28*tau ;
matrix[K,N] P = (
// standard mvn transforms for scale & correlations
(
diag_pre_multiply(sigma,cholcor)
* P_stdnorms
)
// transform for marginal-student-t
.* sqrt(
rep_matrix(nu,N)
./ nu_chi
)
// standard mvn shift for the mean
+ rep_matrix(mu,N)
);
}
model{
// priors for standard multivariate parameters
mu ~ std_normal() ;
sigma ~ weibull(2,1) ;
to_vector(P_stdnorm) ~ std_normal() ;
cholcor ~ lkj_corr_cholesky(1) ;
// priors for marginal-student-t parameters
tau ~ beta(2,2) ;
for(i_K in 1:K){
nu_chi[i_K] ~ chi_square(nu[i_K]) ;
}
// prior for observation-level noise
noise ~ weibull(2,1) ;
// likelihood
Y ~ normal(
to_vector(P)[Y_index_in_flattened_P]
, noise
) ;
}
And the non-hierarchical case:
data{
int N ; // number of multivariate samples
int K ; // number of dimensions of the multivariate
matrix[K,N] Y ; // observations
}
transformed data{
vector[K] K_zeroes = rep_vector(0,K) ;
}
parameters{
// standard mvn parameters
vector[K] mu ;
vector<lower=0>[K] sigma ;
corr_matrix[K] cor ;
// parameters for marginal-student-t
vector<lower=0,upper=1>[K] tau ; // nu = 2+28*tau
matrix<lower=0>[K,N] nu_chi ;
}
transformed parameters{
vector[K] nu = 2+28*tau ;
}
model{
// priors for standard multivariate parameters
mu ~ std_normal() ;
sigma ~ weibull(2,1) ;
cholcor ~ lkj_corr(1) ;
// priors for marginal-student-t parameters
tau ~ beta(2,2) ;
for(i_K in 1:K){
nu_chi[i_K] ~ chi_square(nu[i_K]) ;
}
// likelihood
(
Y
// subtracting the mean
- rep_matrix(mu,N)
// dividing by the student-t helper
./ sqrt(
rep_matrix(nu,N)
./ nu_chi
)
// yielding a variable that is ~ multi_normal(0,quad_form_diag(cor,sigma))
) ~ multi_normal(
K_zeroes
, quad_form_diag(cor,sigma)
) ;
}
Notes:
- The non-hierarchical case involves a likelihood expression that includes a transform of the data involving parameters, but I’m pretty sure neither the subtraction nor the division require a subsequent jacobian correction.
- Given that the student-t begins to approximate the normal by nu=30 and nu<2 has some pathological properties (derivation of the expectation for the marginal variance involves division by
nu-2
), I thought a general concave prior from 2 to 30 achieved bytau~beta(2,2); nu=2+28*tau
made sense.