(wrote this before seeing other replies, and purposefully not reading them to not have the answer spoiled ;])
Your intuitions are correct â you need a Jacobian adjustment â and the exceptions and warnings are because you havenât been explicit about bounds, which also removes any need for special initialization.
The model is well specified â you have some joint probability distribution Pr(x, y, w, V, mean_x, mean_y, sd_x, sd_y), and having observed u and V, you want to sample from a target distribution Pr(x, y, mean_x, mean_y, sd_x, sd_y | w, V). The model is:
x_mean ~ normal(0, 1)
y_mean ~ normal(0, 1)
x_sd ~ half-normal(0, 1)
y_sd ~ half-normal(0, 1)
x_i ~ normal(mean_x, sd_x)
y_i ~ normal(mean_y, sd_y)
w_i ~ uniform(0, 1)
V_i = w_i * exp(x_i) - (1-w_i) * exp(y)
We can do some algebra to express x in terms of w, V, and y:
V_i = w_i * exp(x_i) - (1-w_i) * exp(y_i)
w_i * exp(x_i) = V_i + (1 - w_i) * exp(y_i)
exp(x_i) = ( V_i + ( 1 - w_i ) * exp( y_i ) ) / w_i
x_i = log( ( V_i + ( 1 - w_i ) * exp( y_i ) ) / w_i )
to go in the transformed parameters block as:
vector[n] x = log( ( V + ( 1 - w ) .* exp( y ) ) ./ w );
As weâll be putting a normal() on this, we need a Jacobian adjustment. Because the transformation is element-wise, the Jacobian of the transform is diagonal, which means the determinant of the Jacobian is the product of the diagonal. With w and V being observed, these diagonal elements are
dx_i/dy_i = d/dy_i log( ( V_i + ( 1 - w_i ) * exp( y_i ) ) / w_i )
where V_i and w_i are constants.
This can be rewritten as
d/dy_i log( V_i / w_i + ( 1 - w_i ) / w_i * exp( y_i ) )
and we can specify new constants to be:
A_i = V_i / w_i
B_i = ( 1 - w_i ) / w_i
to get
d/dy_i log( A_i + B_i * exp( y_i ) )
We can take this derivative with the chain rule: dx/dy = dx/du * du/dy, where u_i = A_i + B_i * exp( y_i ) and x_i = log( u_i ).
So
dx_i / du_i = d / du_i ( log (u_i ) ) = 1 / u_i
and
du_i / dy_i = d / dy_i ( A_i + B_i * exp( y_i ) )
= 0 + B_i * exp( y_i )
= B_i * exp( y_i )
multiplying dx_i / du_i * du_i / dy_i, we get:
dx_i / dy_i = 1 / u_i * B_i * exp( y_i )
substituting back in for u, we get:
dx_i / dy_i = 1 / ( A_i + B_i * exp( y_i ) ) * B_i * exp( y_i )
we can rearrange this to get:
= B_i * exp( y_i ) / ( A_i + B_i * exp( y_i ) )
= exp ( y_i ) / (A_i / B_i + exp( y_i ) )
and then simplify A_i / B_i:
A_i / B_i = ( V_i / w_i ) / ( (1 - w_i ) / w_i )
= V_i / ( 1 - w_i )
substituting this back in, we at long last obtain:
dx_i / dy_i = exp ( y_i ) / ( V_i / ( 1 - w_i ) + exp( y_i ) )
The product of these across all i in 1:n is our determinant, but since for numerical stability weâre working with the log density of the target distribution, we add the sum of the logs.
log( exp ( y_i ) / ( V_i / ( 1 - w_i ) + exp( y_i ) ) ) )
which simplifies:
y_i - log ( ( V_i / ( 1 - w_i ) + exp( y_i ) ) )
so our Jacobian adjustment is:
target += sum(y - log ( ( V ./ ( 1 - w ) + exp( y ) ) ) );
Weâre not done yet, though! Recall that:
x_i = log( ( V_i + ( 1 - w_i ) * exp( y_i ) ) / w_i )
Youâre seeing warnings about Stan trying to take the log of a negative number â this is because your y are allowed to take on values that allow for the inner part of the rhs above:
V_i + ( 1 - w_i ) * exp( y_i ) ) / w_i
to be negative. So you need to put a lower bound on each y_i so that the expression above is always positive, enforcing
( V_i + ( 1 - w_i ) * exp( y_i ) ) / w_i > 0
We can do a bit more algebra here:
V_i + ( 1 - w_i ) * exp( y_i ) > 0 * w_i
( 1 - w_i ) * exp( y_i ) > -V_i
exp( y_i ) > -V_i / ( 1 - w_i )
So the lower bound of each y_i is
log( -V_i / ( 1 - w_i ) )
(or where this doesnât exist, because V_i is positive, that y_i does not have a lower bound)
You could put this in the transformed data block like this:
transformed data {
vector[n] lb_y;
for(i in 1:n){
lb_y[i] = V[i] > 0 ? negative_infinity() : log(-V[i] / (1-w[i]));
}
}
and when declaring y as a parameter, write:
vector<lower=lb_y>[n] y;
Putting it all together, we get:
data {
int<lower=0> n;
vector<lower=0,upper=1>[n] w;
vector[n] V;
}
transformed data {
vector[n] lb_y;
for(i in 1:n){
lb_y[i] = V[i] > 0 ? negative_infinity() : log(-V[i] / (1-w[i]));
}
}
parameters {
real mean_x;
real mean_y;
real<lower=0> sd_x;
real<lower=0> sd_y;
vector<lower=lb_y>[n] y;
}
transformed parameters {
vector[n] x = log( ( V + ( 1 - w ) .* exp( y ) ) ./ w );
}
model {
// priors on hyperparameters
mean_x ~ std_normal();
sd_x ~ std_normal();
mean_y ~ std_normal();
sd_y ~ std_normal();
// centered distributions on latent variables
x ~ normal(mean_x, sd_x);
y ~ normal(mean_y, sd_y);
// Jacobian adjustment
target += sum(y - log ( ( V ./ ( 1 - w ) + exp( y ) ) ) );
}
which looks to reliably recover your mean_x, mean_y, sd_x, and sd_y when I run it. At n = 100 and adapt_delta = 0.9, Stan does flag a couple divergent iterations, so non-centering would help (will leave that to you :p), and increasing n to 1000 (to tighten up the target geometry) samples fine with no errors or warnings. The centered model with both n = 100 and n = 1000 fits on my old macbook in a few seconds for 2E3 iterations, and in < 1 minute with 1E4 iterations. So there are lots of avenues to improve things there.