Jacobian for prior on P-spline transformation



I’m working on a discrete choice model for schools where the probability of a child’s family choosing a school is a monotonic function of distance (i.e. you never like a school more just because it is further away). Its a multinomial logit. My data has D demographic subgroups (cross-classification of ethnicity, socio-economic status etc.) and I want the model to include a flexible B-spline function of d_{ij} that varies by these demographics. So a vector \beta_d for each d \in 1,\dots,D.

Because of sparsity and the monotonicity condition, there are three types of structure on the spline coefficients: hierarchical structure to take account of demographics; a smoothness penalty on second differences; and a hard constraint on first differences (\nabla \beta \leq 0) to enforce monotonicity.

So, at the moment it seems most natural to build each vector from main effects \beta_d = \beta_0 + \beta_{eth(d)} + \beta_{SES(d)} + \dots + b_d to provide hierarchical shrinkage. The components (\beta_0,\beta_{eth(d)},b_d etc.) are raw parameters, and \beta_d are transformed parameters.

The problem is that the first and second differences of \beta_d are transformations of these already-transformed parameters), and it’s those I would like to put priors on, in addition to the priors on the components.

So, my question is, is there a way to put priors on the structure and then also put priors on the first and second differences of the \beta_d? If it entails putting priors on the transformed parameters, do I need a Jacobian adjustment? I’m not sure, but I hope maybe not because the first and second diffs are a linear transformation? Any help would be appreciated.

Below you can see what I’m doing now. I’ve taken out irrelevant parts of the model:

data {
  int<lower=1> N;                                // Number of students
  int<lower=2> J;                                 // Number of schools
  int<lower=1> D;                                // Number of demographics
  int<lower=1> n_eth;                          // number of ethnic groups
  int<lower=1> n_coef;                        // Number of spline coefficients
  int<lower=1,upper=J> y[N];              //Dependent variable taking values in {1,...,J} ie. NOT binary
  int<lower=1,upper=D> demog[N];    // which demographic group is pupil in?
  matrix[J, n_coef] X[N];                      // Matrix of regressors (spline coefficients)
  int<lower=1,upper=n_eth> eth[D];    // which ethnicity does demographic represent?
  int<lower=1,upper=2> ses[D];           // which SES group does demographic represent?
  matrix[n_coef,n_coef] DD;                // structure matrix of spline coefficients
   vector[n_coef] zeroes;

transformed data{
    int<lower=1> n_sigmas;    
    n_sigmas = 3;

parameters {
    vector[n_coef] betas;                          // mean spline coefficients
    vector[n_coef] beta_e[n_eth];             // ethnic variations of spline coefficients
    vector[n_coef] beta_p[2];                    // SES variations of spline coefficients
    vector[n_coef] b_d[D];                         // Demographic variations of spline coefficients (residual)
    real<lower=0> sigmas[n_sigmas];       // variances of all params
transformed parameters{
    vector[n_coef] beta_d[D];
    for(d in 1:D){
        beta_d[d]  = betas + sigmas[1]*beta_e[eth[d]] + sigmas[2]*beta_p[puppri[d]] + sigmas[3]*b_d[d];

model {

  betas ~ multi_normal_prec(zeroes,DD);
  for(i in 1:n_eth){
        beta_e[i] ~ multi_normal_prec(zeroes,DD);
    for(i in 1:2){
        beta_p[i] ~ multi_normal_prec(zeroes,DD);

      for(i in 1:D){
        b_d[i] ~ multi_normal_prec(zeroes,DD);
  for(i in 1:n_sigmas){
      sigmas[i] ~ student_t(5,0,5);

  for(n in 1:N){
    y[n] ~ categorical_logit( X[n]*beta_d[demog[n]] );

NB. the structure matrix is the p-spline formulation DD = tau*crossprod(diff(diag(n_coef),differences=k)) plus a little bit of diag(n_coef)*1e-4 because DD is not a full rank precision matrix. This transformation allows me to put smoothness priors on the original parameters, but I’m not sure there’d be a way to do it with the monotonicity prior. For the moment I’m fixing tau (the smoothness parameter) rather than estimating it.


You can have multiple changes to the posterior log-kernel for the “same” unknowns. And if the Jacobian determinant is constant, then whether you adjust the posterior log-kernel for it or not will not affect what parameters are proposed or accepted. I would check your logic several times though.


Thanks Ben. In that case, I guess that the transformation can be represented by a matrix, so the Jacobian won’t depend on the parameters? I’ll code something up and share.


Is there any reason why cross-classification isn’t discussed and used more frequently?


My “cell membership” indicators eth and ses are arrays of indices of length D, where, say, the values of eth are the ethnicity codes of the members of cross-classification cell d. For concreteness, ethnicity might be binary, SES might be binary, and then D=4. Then eth might be (1,2,1,2) and ses might be (1,1,2,2). If I replace


vector[D] eth;
vector[D] ses;

could I then vectorise the statement

for(d in 1:D){
beta_d[d] = betas + sigmas[1]*beta_e[eth[d]] + sigmas[2]*beta_p[ses[d]] + sigmas[3]*b_d[d]; 


beta_d = betas + sigmas[1]*beta_e[eth] + sigmas[2]*beta_p[ses] + sigmas[3]*b_d ;

even though each element being indexed is not a scalar but a vector, and I would therefore be using one statement to construct an array of vectors?


No, you cannot index with a vector, but

beta_d = betas + sigmas[1]*beta_e[eth] + sigmas[2]*beta_p[ses] + sigmas[3]*b_d ;

should work if eth and ses are integer arrays.