Hi guys. I am fairly new to Stan, so this might be a really elementary question. Is it possible to write a custom density function that contains discontinuities and use it to model a likelihood? I have searched the forum and did not find any relevant questions.
For my problem I have checked to ensure my custom likelihood is grammatically correct in the sense that it can correctly calculate the density values of given data points when using fix parameters. Now I am trying to see if I assign some parameters to be unknown and feed my density the simulated data, it can give me back the correct values of the parameters. However, when I try to run RStan it gives me the error of
Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can't start sampling from this initial value.
I know this is probably due to the reason that my density contains discontinuities, so do I have to write my own analytical gradient function? I can paste my code if necessary. Thank you in advance for your help!
What’s the source of discontinuities? The answer is problem dependent. By default the integrator won’t play nice with step functions, could be ok with discontinuities in the gradient sometimes, but solutions are problem specific.
I am doing simultaneous quantile regression by modelling the conditional quantile function piece-wise over L intervals separated by breakpoints 0<\kappa_1<\kappa_2<...<\kappa_L =1, i.e. a fixed basis expansion that takes the form
q(\tau|\mathbf{x}_i) = \sum_{l=1}^{L}\left\{\alpha_{0l}+\alpha_{1l}x_i\right\}B_{l}(\tau)
where B_{l}(\tau) is a function of the quantile function of a base density. This leads to my density on y being
f_Y(y|\mathbf{x})=\sum_{l=1}^L\frac{I\big[q(\kappa_{l-1}|\mathbf{x})<y\le q(\kappa_{l}|\mathbf{x})\big]}{\alpha_{0l}+\alpha_{1l}x}f_0\left(\frac{y-I(l>1)\left[q(\kappa_{l-1}|\mathbf{x})-(\alpha_{0l}+\alpha_{1l}x)q_0(\kappa_{l-1})\right]}{\alpha_{0l}+\alpha_{1l}x}\right)
functions{
matrix thresh(matrix alpha_star, real eps){
int M = cols(alpha_star);
int N = rows(alpha_star);
matrix[N,M] alpha_temp;
for(l in 1:M){
real WC = alpha_star[1,l]-sum(fabs(alpha_star[2:N,l]));
if(WC < eps){
alpha_temp[1,l] = eps;
alpha_temp[2:N,l] = rep_vector(0,N-1);
}else{
alpha_temp[,l] = alpha_star[,l];
}
}
return alpha_temp;
}
int which_less(row_vector x, real a){
int loc = 1;
while((loc<=num_elements(x))&&(x[loc]<a)){
loc += 1;
}
return loc-1;
}
real bqrnn_lpdf(real y, row_vector u_vec, matrix alpha_star, matrix basis_m, real q_base){
real den;
real lden;
real u_a;
matrix[rows(alpha_star),cols(alpha_star)] alpha;
row_vector[cols(basis_m)] q_model;
alpha = thresh(alpha_star,0.1);
q_model = (u_vec*alpha)*basis_m;
if(y<=q_model[1]){
u_a = u_vec*alpha[,1];
den = 1/u_a*exp(normal_lpdf((y/u_a)|0,1));
}else{
int idx = which_less(q_model,y);
u_a = u_vec*alpha[,idx+1];
den = 1/u_a*exp(normal_lpdf(((y-q_model[idx]+u_a*q_base)/u_a)|0,1));
}
lden = log(den);
print(lden);
return lden;
}
}
The thresh
function is a constraint function to ensure the monotonicity of quantile function. u_vec
represents \begin{pmatrix}1 & x_i\end{pmatrix}.
Are the alphas and x
's parameters here or just the alphas?
Given that the alphas are only creating discontinuities by changing the membership of a given y in a given l I’m pretty sure this can be written as a mixture (in terms of Pr[y \in l = 1, ..., L] rather than in terms of indicator functions. In that case it’s not discontinuous in \alpha and this should be fine. Have you gone down that road yet?
I haven’t. I will give it a try. Thanks for the advice!
If I am understanding correctly, my goal is to write f_Y(y|\mathbf{x}) as \sum_{l=1}^L\text{Pr}(q(\kappa_{l-1}|\mathbf{x})<y\le q(\kappa_l|\mathbf{x}))f_l(y|\mathbf{x}) ? The weights in my problem are actually just \kappa_1,\kappa_2-\kappa_1,\cdots, 1-\kappa_{l-1}. But is there a way to build direct connection between f_l(y|\mathbf{x}) and the original pdf?
From your equation I see that alpha contributes to q so I’m not sure how kappa alone can play that role.
Integrating the above density over y gives the CDF
F_Y(y|\mathbf{x})=\begin{cases}
F_0\left(\frac{y}{\alpha_{01}+\alpha_{11}x}\right), \ q(\kappa_0|\mathbf{x})<y\le q(\kappa_1|\mathbf{x})\\\\
F_0\left(\frac{y-q(\kappa_{l-1}|\mathbf{x})+(\alpha_{0l}+\alpha_{1l}x)q_0(\kappa_{l-1})}{\alpha_{0l}+\alpha_{1l}}x\right), \ q(\kappa_{l-1}|\mathbf{x})<y\le q(\kappa_l|\mathbf{x})
\end{cases}
I should mention that
\begin{align*}
B_{1}(\tau) &= \begin{cases}
q_0(\tau), \ \ \ \tau\le \kappa_1 \\
q_0(\kappa_1), \ \tau>\kappa_1
\end{cases} \ \ \ and \\\\
B_{l}(\tau)&= \begin{cases}
0, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \tau\le\kappa_{l-1} \\
q_0(\tau)-q_0(\kappa_{l-1}), \ \ \kappa_{l-1}<\tau\le\kappa_{l} \\
q_0(\kappa_l)-q_0(\kappa_{l-1}), \ \tau>\kappa_l
\end{cases}
\end{align*}
therefore Pr (y\le q(\kappa_l|\mathbf{x})) for example is F_0\left(\frac{[\alpha_{01}+\alpha_{11}x]q_0(\kappa_1)}{\alpha_{01}+\alpha_{11}x}\right)=F_0(q_0(\kappa_1))=\kappa_1.