Log-Simplex Constraints

brucejwittmann · June 25, 2025, 6:02pm

The issue is that by setting the last value to 0 all the other values are forced to vary more strongly to accommodate the simplex constraint. The sampler struggles when the curvature of the region varies strongly and different parameterizations induce stronger or weaker curvature for the sampler to explore. The expanded softmax and ILR approaches are meant to distribute the simplex constraint evenly across every value to make sampling easier.

That part makes sense.

When converting the simplex distribution to a log simplex distribution there will be an extra -log_theta[N]. Fortunately, this cancels out from the mapping theta from the vector Euclidean space to the simplex space since we have the log_theta[N] term there.

I obviously haven’t done the derivations for the ExpandedSoftmax or ILR approaches, so I’ll take your word for the Jacobian adjustment there :) I don’t still don’t understand where the log_theta[N] is coming from for the log-simplex form of the Dirichlet distribution. Here’s my derivation of the Jacobian correction for it:

The transform and inverse-transform relative to Dirichlet-distributed simplex \mathbf{x} are as follows, respectively:

\begin{align} f(\mathbf{x}) &= \ln{\mathbf{x}} = \mathbf{y} \\ f^{-1}(\mathbf{y}) &= \text{e}^{\mathbf{y}} = \mathbf{x}. \end{align}

The Jacobian correction of the inverse function can be calculated as follows:

\begin{align} J_{f^{-1}} = \begin{bmatrix} \frac{\partial x_1}{\partial y_1} & \dotsm & \frac{\partial x_1}{\partial y_K} \\ \vdots & \ddots & \vdots \\ \frac{\partial x_K}{\partial y_1} & \dotsm & \frac{\partial x_K}{\partial y_K} \end{bmatrix} =\begin{bmatrix} \text{e}^{y_1} & 0 & \dotsm & 0 \\ 0 & \text{e}^{y_2} & \dotsm & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dotsm & \text{e}^{y_K} \end{bmatrix} \\ \end{align}

The determinant of a diagonal matrix is just the product of the diagonal elements, so the Jacobian adjustment is

\begin{align} \left|\text{det }J_{f^{-1}}(\mathbf{y})\right| = \prod_{k=1}^K\text{e}^{y_k} \end{align}

Putting everything together, the probability for the exponential-Dirichlet distribution is

\begin{align} P_{\mathbf{y}}(\mathbf{y} | \boldsymbol{\alpha}) = P_{\mathbf{x}}(\text{e}^\mathbf{y} | \boldsymbol{\alpha}) \prod_{k=1}^K\text{e}^{y_k}, \end{align}

which, on the log scale, gives us

\begin{align} \ln{\left(P_{\mathbf{y}}(\mathbf{y} | \boldsymbol{\alpha})\right)} &= \ln{\left(P_{\mathbf{x}}(\text{e}^\mathbf{y} | \boldsymbol{\alpha}) \prod_{k=1}^K\text{e}^{y_k}\right)} \\ &= \ln{\left(\frac{1}{B(\boldsymbol{\alpha})} \prod_{k=1}^{K} (\text{e}^{y_k})^{\alpha_k - 1}\right)} + \ln{\left(\prod_{k=1}^K\text{e}^{y_k}\right)} \\ &= \sum_{k=1}^K\ln{\left(\text{e}^{y_k(\alpha_k - 1)}\right)} - \ln{(B(\boldsymbol{\alpha}))} + \sum_{k=1}^K \ln{\text{e}^{y_k}} \\ &= \sum_{k=1}^K y_k\alpha_k - y_k + y_k - \ln{(B(\boldsymbol{\alpha}))} \\ &= \sum_{k=1}^K y_k\alpha_k - \ln{(B(\boldsymbol{\alpha}))} \end{align}

I’m not seeing where the -log_theta[N] term comes from here.

Topic		Replies	Views
Jacobian of softmax tansformation of a (n-1 degrees of freedom) unbounded parameter Modeling jacobian-adjustment	28	3296	August 23, 2023
Following up on several discussions of simplex adjustments and ragged arrays of simplexes Modeling techniques , specification , user-defined-functions , constraint-transform	13	328	December 10, 2024
Differences in simplex constraining function vs normal simplex Modeling	24	277	January 13, 2026
Ragged array of simplexes Modeling	38	5938	May 12, 2024
Jacobian adjustment of sigmoid (decentered) Modeling	5	1327	April 9, 2018

Log-Simplex Constraints

Related topics