# Looking for help on a few Jacobians

Hoping I can impose (yet again) on the good will of the more mathy folks here to help me work out the jacobian adjustments needed for a couple scenarios.

First:

data{
int n ;
}
transformed data{
real rz_se = 1/sqrt(n-3) ;
}
parameters{
vector[n] x ;
vector[n] y ;
real sigma_x ;
real sigma_y ;
real rhoz ;
}
model{
... // priors on the parameters
real rz = atanh( dot_product(x,y)/(n*sigma_x*sigma_y) ) ;
rz ~ normal(rhoz,rz_se) ; // needs jacobian?
}


And then there’s this:

data{
int n;
vector[n] y ;
}
transformed data{
real sqrt_n = sqrt(n) ;
real var_gamma_par = (n-1)/2 ;
}
parameters{
real mu ;
real<lower=0> sigma ;
}
model{
... // priors on parameters

vector[n] resid = (y-mu)/sigma ; // computation involves data and multiple parameters

real m = mean(resid)/(sigma/sqrt_n) ;
m ~ std_normal() ; // needs jacobian?

real v = variance(resid) ;
v ~ gamma(var_gamma_par,var_gamma_par) ; // needs jacobian?
}


(first pertains to applying my “pairwise” multivariate structure to hierarchical models, second may speed up all models with Gaussian likelihoods)

1 Like

I’ll give it a try. Someone should check my math below. However, on 1, since you have 1 function but 4 parameters the jacobian is 1 x 4 which has no determinant. You can put the prior on which acts as a weighting function in the high density region of the prior. See for more Once more about Stan and Jacobian warnings | Models Of Reality

For 2,

\begin{aligned} m &= \frac{\sqrt{n} \sum_{i=1}^{n} \frac{y_i - \mu}{\sigma}}{n \sigma} \\ &= \frac{\sum_{i=1}^{n} (y_i - \mu) }{\sqrt{n} \sigma^2} \\ \frac{\partial m}{\partial y} &= \frac{1}{\sqrt{n}\sigma^2} \end{aligned}

the log abs of which is - (0.5 \log(n) + 2 \log(\sigma)). You can drop the first part since it’s constant so you just need

target += -2 * log(sigma);


For the second part, check my math.

\begin{aligned} r &= \frac{y - \mu}{\sigma} \\ v &= \frac{1}{n} \sum_{i=1}^{n} (r - \mu_r)^2 \\ \frac{\partial v}{\partial y} &= \frac{2 \sum_{i=1}^{n} (r - \mu_r) }{n \sigma} \\ &= 0 \end{aligned}

update Thinking about this more I think that the second part doesn’t need an adjustment because \mu_r is 0 and then \sum r / n is also 0. It just boils down to

y <- rnorm(1000, mean = 3, sd = 2)
mike_mod <- cmdstan_model("mike_jacobian.stan")
fit <- mike_mod$sample( data = list(n = length(y), y = y), parallel_chains = 2, chains = 2, seed = 12312, adapt_delta = 0.8, max_treedepth = 10, iter_sampling = 500, iter_warmup = 500 ) fit$summary()


where the Stan code is

data{
int n;
vector[n] y ;
}
transformed data{
real sqrt_n = sqrt(n) ;
real var_gamma_par = (n-1)/2 ;
}
parameters{
real mu ;
real<lower=0> sigma ;
}
model{
vector[n] resid = (y-mu)/sigma ; // computation involves data and multiple parameters
real m = mean(resid) / (sigma/sqrt_n) ;
real v = variance(resid) ;

m ~ std_normal() ; // needs jacobian?
v ~ gamma(var_gamma_par,var_gamma_par) ;
target += -2 * log(sigma);
}


The output as

Both chains finished successfully.
Mean chain execution time: 0.3 seconds.
Total execution time: 0.5 seconds.
> fit\$summary()
# A tibble: 3 x 10
variable    mean  median     sd    mad      q5     q95  rhat ess_bulk ess_tail
<chr>      <dbl>   <dbl>  <dbl>  <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -501.   -500.   0.983  0.661  -503.   -500.    1.00     468.     667.
2 mu          2.97    2.97 0.128  0.127     2.77    3.18  1.00     935.     712.
3 sigma       2.01    2.01 0.0445 0.0457    1.94    2.09  1.00     692.     681.

2 Likes

Thank you!! And not to pull a foot-in-the-door persuasion tactic, but it just occurred to me that while the solution you provided for the second scenario solves the use case where y is data, there’d also be a use case in hierarchical models where y is itself a parameter vector:

data{
int n;
...
}
transformed data{
real sqrt_n = sqrt(n) ;
real var_gamma_par = (n-1)/2 ;
}
parameters{
real mu ;
real<lower=0> sigma ;
vector[n] y ;
... // other parameters inc. stuff for final observation-level likelihood
}
model{
... // priors here

vector[n] resid = (y-mu)/sigma ;
real m = mean(resid) / (sigma/sqrt_n) ;
real v = variance(resid) ;

m ~ std_normal() ;
v ~ gamma(var_gamma_par,var_gamma_par) ;

// jacobians for m~ & v~ here

... //other computations here, eventually leading to the final observation-level likelihood
}


Do the jacobians need to be different when y is a parameter vector as compared to when it is a data vector?

@spinkney Here’s a less abbreviated example of the hierarchical scenario:


data{
int n; // number of subjects
int k; // observations per subject
matrix[n,k] z ; // observations
}
transformed data{
real sqrt_n = sqrt(n) ;
real var_gamma_par = (n-1)/2 ;
}
parameters{
real mu ;
real<lower=0> sigma ;
vector[n] y ; // latent subject means
real<lower=0> noise ; // observation-level measurement noise
}
model{
// priors
mu ~ std_normal();
sigma ~ weibull(2,1) ;
noise ~ weibull(2,1) ;

// observation-level likelihood
for(i_n in 1:n){
z[i_n,] ~ normal( y[i_n] , noise ) ;
}

// normally we'd express the hierarchical structure of y
// as an intermediary between mu/sigma and the
// observations as:
//    y ~ normal(mu,sigma) ; // "centered" parameterization (for example)

// but it should also work to instead get the scaled
// residuals of y with respect to mu & sigma and then
// express the sampling distribution expectations for the
// mean & variance of these scaled residuals:

vector[n] scaled_resids = (y-mu)/sigma ;
real m = mean(scaled_resids) / (sigma/sqrt_n) ;
real v = variance(scaled_resids) ;

m ~ std_normal() ;  // alt: mean(y) ~ normal(mu,sigma/sqrt_n) ;
v ~ gamma(var_gamma_par,var_gamma_par) ; //alt: variance(y) ~ gamma( var_gamma_par, var_gamma_par / pow(sigma,2) ) ;

// jacobians for m~ & v~ here

}