Custom Likelihood for Fitting Markov Model



I am trying to fit a multistate model using Bayesian methods.

Most of the data are for exactly observed transition times. The likelihoods for various observation schemes are on page 9 of

I have coded some of these likelihoods into the function MarkovLH.

Am I properly using these likelihoods in the code below with the target += line?

  real MarkovLH(int LHtype, real t1, real t2, real t3, real lam1, real lam2, real lam3){
    real L;
    real logL;

    if (LHtype==1){   
    } else if (LHtype==2){
    return logL;
data {
  int <lower=0> N;
  int <lower=0> LHtype;
  vector[N] grp;
  vector[N] t1; 
  vector[N] t2; 
  vector[N] t3; 
parameters {
  real<lower=0> lambda [3];
  real bet [3];

model {
  for (i in 1:N){  
  target += MarkovLH(LHtype[i], t1[i], t2[i], t3[i], lambda[1]*exp(bet[1]*grp[i]), lambda[2]*exp(bet[2]*grp[i]), lambda[3]*exp(bet[3]*grp[i]) );
  lambda ~ gamma(0.001, 0.001); 
  bet ~ normal(0, 100); 


I think that syntax is legal but you should not compute a likelihood function by computing the likelihood function and then taking its logarithm. Simplify the log-likelihood function analytically, and then implement it, as in

functions {
  real MarkovLH(int LHtype, real t1, real t2, real t3, 
                             real lam1, real lam2, real lam3) {
    return LHtype == 1 ? 
      (lam1 + lam2) * t1 + log(lam1) + lam3 * t2 + log(lam3) :
      lam3 * t1 + log(lam3);

Also, don’t use a gamma(0.001, 0.001) prior ever and even the normal(0, 100) is dubious.


Thank you.

Does the STAN not work properly if run with logs the way I had at first? Or is it just inefficient?


Liable to overflow


Thank you. I got everything to work with the code below, and it agrees with MLE using ( I simplified the pdf.

I am using the priors recommended by ( They recommended normal(0, 10). Can you recommend something less suspect?

functions{ real MarkovKnownlpdf(real t, real [] lam, real censind, int state, int trans){

real lamstay [3];
real LL;
LL = (lamstay[state])*t + log(lam[trans])*censind;
return LL;

real MarkovUnknownlpdf(real t, real [] lam){
real LL;
LL = (-(lam[1] + lam[2]))*t + log(lam[2] + ((-1 + exp((lam[1] + lam[2] - lam[3])*t))*lam[1]*lam[3])/(lam[1] + lam[2] - lam[3]));
return LL;

data {
int <lower=0> Nknown;
int <lower=0> Nunknown;
vector [Nknown] tknown;
vector [Nunknown] tunknown;
int <lower=0> stateknown [Nknown];
int <lower=0> transknown [Nknown];
vector [Nknown] censindknown;
vector [Nknown] grpknown;
vector [Nunknown] grpunknown;

parameters {
real llam [3];
real bet [3];

transformed parameters {
real lam [3];
for (k in 1:3){lam[k]=exp(llam[k]);}

model {
real tempknown [3];
real tempunknown [3];
for (i in 1:Nknown){
target += MarkovKnownlpdf(tknown[i], tempknown, censindknown[i], stateknown[i], transknown[i]);
for (j in 1:Nunknown){
target += MarkovUnknownlpdf(tunknown[j], tempunknown);
llam ~ normal(0, 10);
bet ~ normal(0, 10);


The _lpdf notation does not change the draws (aside from some numerical instability). You have to change the model to make it more amenable to NUTS estimation.


The optimization agrees with the MLE? Because usually Bayesian posterior means aren’t exactly the same as the MLE.

So that doesn’t sound like an MLE—penalized MLE maybe?

We recommend priors that are at least weakly informative in specifying the order of magnitude of the posterior and its tail behavior. Why is normal(0, 10) suspect here?

real LL;
LL = (-(lam[1] + lam[2]))*t + log(lam[2] + ((-1 + exp((lam[1] + lam[2] - lam[3])*t))*lam[1]*lam[3])/(lam[1] + lam[2] - lam[3]));
return LL;

can just be a one-liner with the return.

You probably ant to be using log-sum-exp if you’re combining logs and exponents.

real lam [3];
for (k in 1:3){lam[k]=exp(llam[k]);}

can be coded as

real lam[3] = exp(llam);

But you probably don’t want to be moving off the log scale so soon.

Indentation would help, too.

The variable lamstay [3] is declared with three arguments, but only two are defined and then only one is ever used (the one corresponding to state).

A lot of the rest can be done with matrix operations and more vectorization, like all those tempknown definitions, e.g.

tempknown = lam .* exp(beta * grpknown[i]);

bug again, I wouldn’t be jumping off the log scale unless necessary here. Some of these things you’re adding and then exponentiationg and then logging again. For arithmetic stability, quantities need to stay on the log scale as long as possible.

Expressions like lam[1] + lam[2] - lam[3] should be computed once and stored rather than being recomputed multiple times in a function.