Jacobian adjustments are usually not recommended–instead you should build a generative model.
So put your prior not on p_i but on \mu and \sigma.
Think about this: the fact that you know t_i implies a very strong relationship between different p_i–a relationship that generic draws from Dirichlet do not obey. How then is \alpha supposed to be consistent with your prior information?
I’m interested in solving the type of problem thats similar to this:
Suppose there’s a bowl with a random number of jelly beans (non-integer number allowed) that’s normally distributed with mean mu and stdev sigma. The number of jelly beans can be put into K brackets: [-\infty,t_1], [t_1,t_2] , [t_2,t_3], … [ t_{k-1},\infty]. Ten thousand people with varying degrees of information about the process guess which bracket the number of jelly beans will fall in. The information I get is the aggregate guesses of these individuals, what the brackets are, and weak priors on mu and sigma (ie \mu ~ Normal(500,200)$ ).
In this context , does the model I chose seem appropriate or is there something that makes more sense?
The given data are the \alpha's and the t_i's . So we have priors on the CDFs, which are functions of the parameters \mu and \sigma. We try to use those priors along with the result to get parameter estimates of \mu and \sigma.
Not without adding the log determinant. In the 2-D case (ie when p \in \mathbb{R^2}) the deteriminant is easy to calculate and then with a weak prior on \mu the model converges.
I guess, you assume a map (\mu,\sigma) \to (p_1,p_2) is a transformation of local coordinate, but I am not sure that it is correct. Because p_1+p_2=1, the target space has dimension one, namely, it maps from high (i.e.,2) dimension space to low (i.e., 1) dimension space and it seems, to me ,that it is incorrect.
I am interesting to this topic, because my model maybe also require the Jacobian adjustment but I am not sure what should I do for my model.
You’re right that the map I said goes to dimension 1. I misspoke; what I meant to say is that when p \in \mathbb{R^3} then the Jacobian is two dimensional and then the model converges well.
My current approach is to investigate the behaviour of the Dirichlet under different assumptions. If you look at the density of Dirichlet, (https://en.wikipedia.org/wiki/Dirichlet_distribution) the different p_i's are dependent on each other through the constraint that \Sigma p_i = 1. However, if the $p_i$s each correspond to a distinct but exhaustive part of the support, then that constraint is sort of automatically satisfied. Then the joint pdf of p can be broken into the individual product of the marginals of $p_i$s. Each p_i would have a distribution of Beta(\alpha_i,1) . Not sure how correct/rigorous this is, but I’m trying whatever half makes sense at this point. Another approach is to solely model each p_i by it’s marginal which is Beta(\alpha_i, (\Sigma \alpha_j) - \alpha_i). In either of these instances, then we just have multiple maps from (\mu, \sigma) \rightarrow (\Phi_i, \Phi_{i+1}) with \Phi_i = \Phi(\frac{x_i - \mu}{\sigma}). Then from (\Phi_i, \Phi_{i+1}) to p_i is a linear transformation of those two so the log Jacobian of that transform would be 0.
Also, as a sidenote, my model without any Jacobian adjustment , does appear to converge well and provides predictions/estimates that make sense and work well. Perhaps yours w/o the Jacobian adjustment would as well.
I go back to the topic of the title, namely, Jaocobian with low to high.
In the following, I want to show that such a case never occur.
To simplify our argument, I consider the following situation. Let (\Omega, \sigma) be a measurable space. Let X:\Omega \to \mathbb{R} be a random variable. Also, let \varphi:\mathbb{R} \to \mathbb{R}^2 be a continuous or differentiable map. Then, consider the model
where \text{Normal}( \overrightarrow{\mu}, \Sigma) denotes the two dimensional normal distribution of mean vector \overrightarrow{\mu} and variance matrix \Sigma. In this stituation, such a jocobian adjustment with low to high dimesion is required. However, this modeling is wrong, because, the image of \varphi (X) is at most one dimesnional manifold, thus we can consider the non-empty set \mathbb{R}^2 - \{ \varphi(X(\omega))| \omega \in \Omega\}on which, data are never observed, but it should be observed because it is normally distributed. Thus the modeling itself is wrong. Similarly, any Jacobian adjustment with low to high dimension mapping will never occur.
But such a Dirichlet distribution, if the range of random variable (i.e., image of random variable \{X(\omega);\omega \in \Omega\} is represented by a submanifold (e.g., \{ p; \Sigma p_i=1\}) in a higher dimensional ambient space, then apparently the situation of low to high dimension occur. Then to consider such Jacobian adjustment, we have to take a system of local coordinates for the submanifold.
My apologies, I assume the notion of manifold, I do not want to use such a notion … sorry.
As mentioned in @spinkney’s link there is no such thing as a Jacobian for a low to high dimensional mapping. You have to first work out a bijective (one-tone) transformation with a well-defined Jacobain and then marginalize out the excess dimension in a separate step. See https://arxiv.org/abs/1010.3436 for how to do this for the simplex.
Perhaps more relevant to the application mentioned see https://betanalpha.github.io/assets/case_studies/ordinal_regression.html for an example of using domain information on integrated probabilities (in the form of a Dirichlet density function) to inform a prior model for latent parameters.