EDIT: Iām also kinda learning this right now ā this is also why I took my time with this post. Please, donāt treat this post as authoritative. If all is correct and you learn somethingā¦ all the better! :)
Yes, thatās what I meant: The function you transform y_1 with (lets say f_1(\cdot)=\log(\cdot) only transforms y_1 and the function you transform y_2 with f_2(\cdot)=\log(\cdot) only transforms y_2. (That notation is a bit clunky, but I think you get the point.) Another way to think about this is would be this (let each y_1, \ldots, y_k be vectors with N observations each):
f_1(y_1,y_2,\ldots,y_k) =\log y_1 \\
f_2(y_1,y_2,\ldots,y_k) =\log y_2 \\
\vdots \\
f_k(y_1,y_2,\ldots,y_k) =\log y_k \\
with partial derivatives:
\dfrac{\partial f_1(y_1,y_2,\ldots,y_k)}{\partial y_1} =1/y_1 \\
\vdots \\
\dfrac{\partial f_k(y_1,y_2,\ldots,y_k)}{\partial y_k} =1/ y_k \\
But these are not all the partial derivativesā¦ think of \partial f_1/\partial y_2 and so on. However, all those others are kinda trivialā¦ they are just 0. So the Jacobian looks like this:
\begin{bmatrix}
\frac{\partial f_1}{\partial y_1} & \frac{\partial f_1}{\partial y_2} & \cdots & \frac{\partial f_1}{\partial y_k} \\
\frac{\partial f_2}{\partial y_1} & \frac{\partial f_2}{\partial y_2} & \cdots & \frac{\partial f_2}{\partial y_k} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_k}{\partial y_1} & \frac{\partial f_k}{\partial y_2} & \cdots & \frac{\partial f_k}{\partial y_k} \\
\end{bmatrix} =
\begin{bmatrix}
1/y_1 & 0 & \cdots & 0 \\
0 & 1/y_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1/y_k \\
\end{bmatrix}
and the determinant of this is just 1/y_1 \times 1/y_2 \times \ldots \times 1/y_k. (Since we know that all y are positive, we donāt need to take the absolute value here.) Now we only have to take the log of this determinant: \log(1/y_1) + \log(1/y_2) + \dots + \log(1/y_k). This can be rewritten as:
-\log(y_1)-\log(y_2)-\ldots-\log(y_k)
and in Stan (with vector[N] log_y[K]
)
for (k in 1:K)
target += -log_y[k]
(I wonder if you could do for (k in 1:K) target -= log_y[k]
, looks weirdā¦ haha.)
And as for the one-to-one transformationā¦ Itās one input and one output, as opposed toā¦ letās say the mean or sum as a many-to-one transformation. One-to-many transformationsā¦ err, maybe sqrt
? But, yeah, thatās the pointā¦
All of the above, you can find in this pretty neat paper (itās more like a look-up-thing to me). The context is different (Deep Learning) but it covers Jacobians and one-to-one, many-to-one etc. transformations nicely (IMO).