Looking for help on a few Jacobians

Hoping I can impose (yet again) on the good will of the more mathy folks here to help me work out the jacobian adjustments needed for a couple scenarios.

First:

data{
  int n ;
}
transformed data{
  real rz_se = 1/sqrt(n-3) ;
}
parameters{
  vector[n] x ;
  vector[n] y ;
  real sigma_x ;
  real sigma_y ;
  real rhoz ;
}
model{
  ... // priors on the parameters
  real rz = atanh( dot_product(x,y)/(n*sigma_x*sigma_y) ) ;
  rz ~ normal(rhoz,rz_se) ; // needs jacobian?
}

And then there’s this:

data{
  int n;
  vector[n] y ;
}
transformed data{
  real sqrt_n = sqrt(n) ;
  real var_gamma_par = (n-1)/2 ;
}
parameters{
  real mu ;
  real<lower=0> sigma ;
}
model{
  ... // priors on parameters

  vector[n] resid = (y-mu)/sigma ; // computation involves data and multiple parameters 

  real m = mean(resid)/(sigma/sqrt_n) ;
  m ~ std_normal() ; // needs jacobian?

  real v = variance(resid) ;
  v ~ gamma(var_gamma_par,var_gamma_par) ; // needs jacobian?
}

(first pertains to applying my “pairwise” multivariate structure to hierarchical models, second may speed up all models with Gaussian likelihoods)

1 Like

I’ll give it a try. Someone should check my math below. However, on 1, since you have 1 function but 4 parameters the jacobian is 1 x 4 which has no determinant. You can put the prior on which acts as a weighting function in the high density region of the prior. See for more Once more about Stan and Jacobian warnings | Models Of Reality

For 2,

\begin{aligned} m &= \frac{\sqrt{n} \sum_{i=1}^{n} \frac{y_i - \mu}{\sigma}}{n \sigma} \\ &= \frac{\sum_{i=1}^{n} (y_i - \mu) }{\sqrt{n} \sigma^2} \\ \frac{\partial m}{\partial y} &= \frac{1}{\sqrt{n}\sigma^2} \end{aligned}

the log abs of which is - (0.5 \log(n) + 2 \log(\sigma)). You can drop the first part since it’s constant so you just need

target += -2 * log(sigma);

For the second part, check my math.

\begin{aligned} r &= \frac{y - \mu}{\sigma} \\ v &= \frac{1}{n} \sum_{i=1}^{n} (r - \mu_r)^2 \\ \frac{\partial v}{\partial y} &= \frac{2 \sum_{i=1}^{n} (r - \mu_r) }{n \sigma} \\ &= 0 \end{aligned}

update Thinking about this more I think that the second part doesn’t need an adjustment because \mu_r is 0 and then \sum r / n is also 0. It just boils down to

y <- rnorm(1000, mean = 3, sd = 2)
mike_mod <- cmdstan_model("mike_jacobian.stan")
fit <- mike_mod$sample(
  data = list(n = length(y),
              y = y),
  parallel_chains = 2,
  chains = 2,
  seed = 12312,
  adapt_delta = 0.8,
  max_treedepth = 10,
  iter_sampling = 500,
  iter_warmup = 500
)
fit$summary()

where the Stan code is

data{
  int n;
  vector[n] y ;
}
transformed data{
  real sqrt_n = sqrt(n) ;
  real var_gamma_par = (n-1)/2 ;
}
parameters{
  real mu ;
  real<lower=0> sigma ;
}
model{
  vector[n] resid = (y-mu)/sigma ; // computation involves data and multiple parameters 
  real m = mean(resid) / (sigma/sqrt_n) ;
  real v = variance(resid) ;
  
  m ~ std_normal() ; // needs jacobian?
  v ~ gamma(var_gamma_par,var_gamma_par) ;
  target += -2 * log(sigma);
}

The output as

Both chains finished successfully.
Mean chain execution time: 0.3 seconds.
Total execution time: 0.5 seconds.
> fit$summary()
# A tibble: 3 x 10
  variable    mean  median     sd    mad      q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl>  <dbl>  <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -501.   -500.   0.983  0.661  -503.   -500.    1.00     468.     667.
2 mu          2.97    2.97 0.128  0.127     2.77    3.18  1.00     935.     712.
3 sigma       2.01    2.01 0.0445 0.0457    1.94    2.09  1.00     692.     681.
2 Likes

Thank you!! And not to pull a foot-in-the-door persuasion tactic, but it just occurred to me that while the solution you provided for the second scenario solves the use case where y is data, there’d also be a use case in hierarchical models where y is itself a parameter vector:

data{
  int n;
  ...
}
transformed data{
  real sqrt_n = sqrt(n) ;
  real var_gamma_par = (n-1)/2 ;
}
parameters{
  real mu ;
  real<lower=0> sigma ;
  vector[n] y ;
  ... // other parameters inc. stuff for final observation-level likelihood
}
model{
  ... // priors here

  vector[n] resid = (y-mu)/sigma ; 
  real m = mean(resid) / (sigma/sqrt_n) ;
  real v = variance(resid) ;
  
  m ~ std_normal() ;  
  v ~ gamma(var_gamma_par,var_gamma_par) ;

  // jacobians for m~ & v~ here

  ... //other computations here, eventually leading to the final observation-level likelihood
}

Do the jacobians need to be different when y is a parameter vector as compared to when it is a data vector?

@spinkney Here’s a less abbreviated example of the hierarchical scenario:


data{
  int n; // number of subjects
  int k; // observations per subject
  matrix[n,k] z ; // observations
}
transformed data{
  real sqrt_n = sqrt(n) ;
  real var_gamma_par = (n-1)/2 ;
}
parameters{
  real mu ; 
  real<lower=0> sigma ;
  vector[n] y ; // latent subject means
  real<lower=0> noise ; // observation-level measurement noise
}
model{
  // priors
  mu ~ std_normal();
  sigma ~ weibull(2,1) ;
  noise ~ weibull(2,1) ;
  
  // observation-level likelihood
  for(i_n in 1:n){
    z[i_n,] ~ normal( y[i_n] , noise ) ;
  }
  
  // normally we'd express the hierarchical structure of y 
  // as an intermediary between mu/sigma and the 
  // observations as: 
  //    y ~ normal(mu,sigma) ; // "centered" parameterization (for example)

  // but it should also work to instead get the scaled 
  // residuals of y with respect to mu & sigma and then
  // express the sampling distribution expectations for the
  // mean & variance of these scaled residuals:

  vector[n] scaled_resids = (y-mu)/sigma ; 
  real m = mean(scaled_resids) / (sigma/sqrt_n) ;
  real v = variance(scaled_resids) ;
  
  m ~ std_normal() ;  // alt: mean(y) ~ normal(mu,sigma/sqrt_n) ;
  v ~ gamma(var_gamma_par,var_gamma_par) ; //alt: variance(y) ~ gamma( var_gamma_par, var_gamma_par / pow(sigma,2) ) ;

  // jacobians for m~ & v~ here

}