I’m working on a discrete choice model for schools where the probability of a child’s family choosing a school is a monotonic function of distance (i.e. you never like a school more just because it is further away). Its a multinomial logit. My data has D demographic subgroups (cross-classification of ethnicity, socio-economic status etc.) and I want the model to include a flexible B-spline function of d_{ij} that varies by these demographics. So a vector \beta_d for each d \in 1,\dots,D.
Because of sparsity and the monotonicity condition, there are three types of structure on the spline coefficients: hierarchical structure to take account of demographics; a smoothness penalty on second differences; and a hard constraint on first differences (\nabla \beta \leq 0) to enforce monotonicity.
So, at the moment it seems most natural to build each vector from main effects \beta_d = \beta_0 + \beta_{eth(d)} + \beta_{SES(d)} + \dots + b_d to provide hierarchical shrinkage. The components (\beta_0,\beta_{eth(d)},b_d etc.) are raw parameters, and \beta_d are transformed parameters.
The problem is that the first and second differences of \beta_d are transformations of these already-transformed parameters), and it’s those I would like to put priors on, in addition to the priors on the components.
So, my question is, is there a way to put priors on the structure and then also put priors on the first and second differences of the \beta_d? If it entails putting priors on the transformed parameters, do I need a Jacobian adjustment? I’m not sure, but I hope maybe not because the first and second diffs are a linear transformation? Any help would be appreciated.
Below you can see what I’m doing now. I’ve taken out irrelevant parts of the model:
data {
int<lower=1> N; // Number of students
int<lower=2> J; // Number of schools
int<lower=1> D; // Number of demographics
int<lower=1> n_eth; // number of ethnic groups
int<lower=1> n_coef; // Number of spline coefficients
int<lower=1,upper=J> y[N]; //Dependent variable taking values in {1,...,J} ie. NOT binary
int<lower=1,upper=D> demog[N]; // which demographic group is pupil in?
matrix[J, n_coef] X[N]; // Matrix of regressors (spline coefficients)
int<lower=1,upper=n_eth> eth[D]; // which ethnicity does demographic represent?
int<lower=1,upper=2> ses[D]; // which SES group does demographic represent?
matrix[n_coef,n_coef] DD; // structure matrix of spline coefficients
vector[n_coef] zeroes;
}
transformed data{
int<lower=1> n_sigmas;
n_sigmas = 3;
}
parameters {
vector[n_coef] betas; // mean spline coefficients
vector[n_coef] beta_e[n_eth]; // ethnic variations of spline coefficients
vector[n_coef] beta_p[2]; // SES variations of spline coefficients
vector[n_coef] b_d[D]; // Demographic variations of spline coefficients (residual)
real<lower=0> sigmas[n_sigmas]; // variances of all params
}
transformed parameters{
vector[n_coef] beta_d[D];
for(d in 1:D){
beta_d[d] = betas + sigmas[1]*beta_e[eth[d]] + sigmas[2]*beta_p[puppri[d]] + sigmas[3]*b_d[d];
}
}
model {
betas ~ multi_normal_prec(zeroes,DD);
for(i in 1:n_eth){
beta_e[i] ~ multi_normal_prec(zeroes,DD);
}
for(i in 1:2){
beta_p[i] ~ multi_normal_prec(zeroes,DD);
}
for(i in 1:D){
b_d[i] ~ multi_normal_prec(zeroes,DD);
}
for(i in 1:n_sigmas){
sigmas[i] ~ student_t(5,0,5);
}
for(n in 1:N){
y[n] ~ categorical_logit( X[n]*beta_d[demog[n]] );
}
}
NB. the structure matrix is the p-spline formulation DD = tau*crossprod(diff(diag(n_coef),differences=k))
plus a little bit of diag(n_coef)*1e-4
because DD
is not a full rank precision matrix. This transformation allows me to put smoothness priors on the original parameters, but I’m not sure there’d be a way to do it with the monotonicity prior. For the moment I’m fixing tau (the smoothness parameter) rather than estimating it.