Negative value in log when using randomly assign coefficient for B spline and Integration of exponential of cubic B spline question

Hi All,
I currently encounter a question about the negative value occurring in the log. I used the B-spline to estimate the baseline function in the survival model. I write the log-likelihood which is shown at bottom of the code. However, since the coefficient of the spline basis is randomly assigned, so it could be positive or negative and it made the value of “bsl” negative sometimes, however, the “bsl” is the value in the log, it cannot be negative, so I’m not sure how to make the likelihood part work.

Please see my code below. * means multiply.

// B spline used for baseline function
data {
  int<lower=0> N; // number of subject
  int<lower=0> N_row; // number of rows in the dataset
  //int<lower=0> K; // number of observed events
  //int<lower=0> Cn; // number of censored observations
  int ncov; // number of covariates
  int nknot; // number of basis functions
  int<lower=1,upper=N_row> ID[N_row];
  matrix[N_row,ncov] X; // covariate matrix (treatment, sex)
  matrix[N_row, nknot] bv; // basis matrix of cubic B spline
  matrix[N_row, nknot] ibv; // integrated basis matrix of cubic B spline
  vector[N_row] cen;  // cen=1, observed; cen=0, censored 
}

parameters {
  vector[nknot] u;  // coefficients of spline basis 
  real beta1;      // coefficient of treatment
  real beta2;      // coeffcient of sex
  vector[N] w;     // frialty term
  real<lower = 0> sigmaw; // variance parameter for frailty term
}

model {
  vector[N_row] bsl;
  vector[N_row] ibsl;
  real x_beta;
  
  // prior
  for (i in 1:nknot){
    u[i] ~ normal(0,10);                  
  }
  beta1 ~ normal(0,sqrt(1000));           // normal(mu,sigma)
  beta2 ~ normal(0,sqrt(1000));
  w ~ normal(0,sqrt(sigmaw));
  sigmaw ~ inv_gamma(0.001, 0.001);
  //sigmaw ~ normal(0,sqrt(1000));

  // likelihood 
  bsl = bv*u;                       // B spline baseline in first part of likelihood 
  ibsl = ibv*u;                     // integrated B spline baseline in second part of likelihood
   
  for (i in 1:N_row){
    x_beta = beta1*X[i,1] + beta2*X[i,2]+w[ID[i]];
    //target += cen[i]*(log(bsl[i]) + x_beta) - exp(x_beta)*ibsl[i];
    target += cen[i]*log(bsl[i]*exp(x_beta)) - exp(x_beta)*ibsl[i]; 
  }
}

I’m also thinking about putting the baseline into the exponential function to avoid the log function. But I’m not quite familiar with how to integrate the exponential of the B spline function. I googled and found it would make it even more complicated. (I could be wrong.) Please see the code for how to get the “bv” and “ibv”.

# Prepare data for fitting the model
kl<-c(min(pwpgt.data$t2)+1,c(1:5)*(max(pwpgt.data$t2) min(pwpgt.data$t2))/6,max(pwpgt.data$t2)-1) # knots for B-splines
bv<- t(as.matrix(t(bs(pwpgt.data$t2,knots=kl,degree=3, intercept = TRUE))))
ibv<- t(as.matrix(t(splines2::ibs(pwpgt.data$t2,knots=kl,degree=3, intercept = TRUE))))

It would be great if anyone could help with this question and much more highly appreciated!

Thank you very much!

EDIT: @maxbiostat edited this post for syntax highlighting.

1 Like

@hhau , can you spot something off here?

If you are modelling the baseline hazard directly with a B-spline, then the coefficients of the spline must be positive. So u needs a <lower = 0> constraint.

1 Like

Thanks for helping edit the post and direct it to the expert on the topic.

1 Like

Cheers. That’s my job round here.

1 Like

Thanks, @hhau! So you mean in the STAN, I have to make the coefficient to be positive if directly using B-spline for constructing the baseline hazard. I understand the likelihood is written in the log scale, then I can use the target += for loglikelihood (which is adding process). However, I also see the WINBUGS could write the likelihood directly (I mean without using log scale, so the likelihood is written by the multiplication of each part) and assign either positive or negative value for the coefficient, and I think it is more reasonable. So my questions are, 1) is there any way to re-write my current STAN code so that I can also assign negative value in the STAN? 2) Since WINBUGS could assign negative values for the coefficient which I think is more reasonable, does it mean I’d better use WINBUGS than using STAN for B-spline baseline hazard?
Thank you very much if you may offer more help!

1 Like

I think the only way to do this in BUGS is using the zeros trick, which lets you add arbitrary terms to the log-likelihood. Moreover, I think it is extremely challenging to implement a B-spline for the log-hazard inside BUGS; one would need to manually do the numerical integration in the BUGS language, and add the result to the log-likelihood using the zeros trick again (doi.org/10.1214/15-BA954 mention this difficulty in passing).

There seems to be some confusion between the likelihood, log-likelihood, baseline hazard, and log-baseline-hazard. The difference between the likelihood and log-likelihood is purely numeric. That is, the same model can be written for either the probability or log scale. This is distinct from the difference between modelling the baseline-hazard directly, and modelling the log-baseline-hazard. These are two different models and will have different (log-)likelihoods.

Using the notation from http://arxiv.org/abs/2002.09633, choosing to model the log-baseline-hazard results in an

\int_{0}^{T_{i}} \exp\{B(z; \gamma, \kappa, \delta)\}\text{d}z

term (where B(z; \cdot) is a function that returns the appropriate B-spline basis evaluated at z), which is intractable and must be computed numerically (as rstanarm::stan_surv / rstanarm::stan_jm do).

Alternatively, if you model the baseline-hazard directly, you end up with a

\log\left(\int_{0}^{T_{i}} B(z; \gamma, \kappa, \delta)\text{d}z \right)

term in the log-likelihood, which is tractable (and the integral is implemented by splines2::ibs). The constraint here is that the baseline hazard must be positive for all z, so the coefficients of the spline (\gamma) must be constrained to be positive.

2 Likes

@hhau Thank you very much for your explanation and great help! I’m currently trying both methods now. But when I use rstanarm to define the baseline hazard by cubic b-spline, it gets me weird, maybe I’m still not much understanding it. Since I defined 5 inner knots: basehaz_ops = list(degree = 3, knots = kn), where kn has five values, but the final results only provide me 8 coefficients of the b-spline. I’m thinking it should be 5 (inner knots) +2 (boundary knots) + 4 (order) = 11 coefficients. I don’t know what happened here. Hope you may explain a bit if possible.

Thank you!

1 Like