Jacobian for prior on P-spline transformation

specification

#1

I’m working on a discrete choice model for schools where the probability of a child’s family choosing a school is a monotonic function of distance (i.e. you never like a school more just because it is further away). Its a multinomial logit. My data has D demographic subgroups (cross-classification of ethnicity, socio-economic status etc.) and I want the model to include a flexible B-spline function of d_{ij} that varies by these demographics. So a vector \beta_d for each d \in 1,\dots,D.

Because of sparsity and the monotonicity condition, there are three types of structure on the spline coefficients: hierarchical structure to take account of demographics; a smoothness penalty on second differences; and a hard constraint on first differences (\nabla \beta \leq 0) to enforce monotonicity.

So, at the moment it seems most natural to build each vector from main effects \beta_d = \beta_0 + \beta_{eth(d)} + \beta_{SES(d)} + \dots + b_d to provide hierarchical shrinkage. The components (\beta_0,\beta_{eth(d)},b_d etc.) are raw parameters, and \beta_d are transformed parameters.

The problem is that the first and second differences of \beta_d are transformations of these already-transformed parameters), and it’s those I would like to put priors on, in addition to the priors on the components.

So, my question is, is there a way to put priors on the structure and then also put priors on the first and second differences of the \beta_d? If it entails putting priors on the transformed parameters, do I need a Jacobian adjustment? I’m not sure, but I hope maybe not because the first and second diffs are a linear transformation? Any help would be appreciated.

Below you can see what I’m doing now. I’ve taken out irrelevant parts of the model:

data {
  int<lower=1> N;                                // Number of students
  int<lower=2> J;                                 // Number of schools
  int<lower=1> D;                                // Number of demographics
  int<lower=1> n_eth;                          // number of ethnic groups
  int<lower=1> n_coef;                        // Number of spline coefficients
  int<lower=1,upper=J> y[N];              //Dependent variable taking values in {1,...,J} ie. NOT binary
  int<lower=1,upper=D> demog[N];    // which demographic group is pupil in?
  matrix[J, n_coef] X[N];                      // Matrix of regressors (spline coefficients)
  int<lower=1,upper=n_eth> eth[D];    // which ethnicity does demographic represent?
  int<lower=1,upper=2> ses[D];           // which SES group does demographic represent?
  matrix[n_coef,n_coef] DD;                // structure matrix of spline coefficients
   vector[n_coef] zeroes;
}

transformed data{
    int<lower=1> n_sigmas;    
    n_sigmas = 3;
}

parameters {
    vector[n_coef] betas;                          // mean spline coefficients
    vector[n_coef] beta_e[n_eth];             // ethnic variations of spline coefficients
    vector[n_coef] beta_p[2];                    // SES variations of spline coefficients
    vector[n_coef] b_d[D];                         // Demographic variations of spline coefficients (residual)
    real<lower=0> sigmas[n_sigmas];       // variances of all params
}
transformed parameters{
    vector[n_coef] beta_d[D];
    
    for(d in 1:D){
        beta_d[d]  = betas + sigmas[1]*beta_e[eth[d]] + sigmas[2]*beta_p[puppri[d]] + sigmas[3]*b_d[d];
    }
}

model {

  betas ~ multi_normal_prec(zeroes,DD);
  
  for(i in 1:n_eth){
        beta_e[i] ~ multi_normal_prec(zeroes,DD);
  }
    for(i in 1:2){
        beta_p[i] ~ multi_normal_prec(zeroes,DD);
  }

      for(i in 1:D){
        b_d[i] ~ multi_normal_prec(zeroes,DD);
  }
  for(i in 1:n_sigmas){
      sigmas[i] ~ student_t(5,0,5);
  }

  
  for(n in 1:N){
    y[n] ~ categorical_logit( X[n]*beta_d[demog[n]] );
  }
  
}

NB. the structure matrix is the p-spline formulation DD = tau*crossprod(diff(diag(n_coef),differences=k)) plus a little bit of diag(n_coef)*1e-4 because DD is not a full rank precision matrix. This transformation allows me to put smoothness priors on the original parameters, but I’m not sure there’d be a way to do it with the monotonicity prior. For the moment I’m fixing tau (the smoothness parameter) rather than estimating it.


#2

You can have multiple changes to the posterior log-kernel for the “same” unknowns. And if the Jacobian determinant is constant, then whether you adjust the posterior log-kernel for it or not will not affect what parameters are proposed or accepted. I would check your logic several times though.


#3

Thanks Ben. In that case, I guess that the transformation can be represented by a matrix, so the Jacobian won’t depend on the parameters? I’ll code something up and share.


#4

Is there any reason why cross-classification isn’t discussed and used more frequently?


#5

My “cell membership” indicators eth and ses are arrays of indices of length D, where, say, the values of eth are the ethnicity codes of the members of cross-classification cell d. For concreteness, ethnicity might be binary, SES might be binary, and then D=4. Then eth might be (1,2,1,2) and ses might be (1,1,2,2). If I replace

with

vector[D] eth;
vector[D] ses;

could I then vectorise the statement

for(d in 1:D){
beta_d[d] = betas + sigmas[1]*beta_e[eth[d]] + sigmas[2]*beta_p[ses[d]] + sigmas[3]*b_d[d]; 
}

as

beta_d = betas + sigmas[1]*beta_e[eth] + sigmas[2]*beta_p[ses] + sigmas[3]*b_d ;

even though each element being indexed is not a scalar but a vector, and I would therefore be using one statement to construct an array of vectors?


#6

No, you cannot index with a vector, but

beta_d = betas + sigmas[1]*beta_e[eth] + sigmas[2]*beta_p[ses] + sigmas[3]*b_d ;

should work if eth and ses are integer arrays.