Rstan results from MLE (optimizing) and MCMC (stan) do not match

Hi, I have a model which basically generates multi-variate gaussians where the mean vector and the covariance matrix are dynamically computed using some custom functions as shown bellow:

functions {
  vector computeWeights(int nterm, int nreg, real alpha, int[] nbranch, real[] reg_profile, real[] epochs, vector theta) {
    int index = 1;
    real y[nterm];
    matrix[nterm, nreg] W;
    for (term in 1:nterm) {
      int nb = nbranch[term];
      for (br in 1:nb) {
        real t = epochs[index] - epochs[index+br-1];
        y[br] = exp(-alpha*t);
      }
      for (br in 1:(nb-1)) {
        y[br] = y[br] - y[br+1];
      }
      for (reg in 1:nreg) {
        real dotp = 0.0;
        for (br in 1:nb) {
          dotp += y[br]*(reg_profile[index+br-1] == reg ? 1 : 0);
        }
        W[term,reg] = dotp;
      }
      index += nb;
    }
    return (W*theta);
  }
  
  matrix computeCovars(int nterm, real alpha, matrix bt, real sigmasq) {
    matrix[nterm, nterm] V;
    for (termi in 1:nterm) {
      for (termj in 1:nterm) {
        real ii = bt[termi, termi];
        real jj = bt[termj, termj];
        real ij = bt[termi, termj];
        real part1 = exp(alpha*(-ii - jj + 2*ij));
        real part2 = (1-exp(-2*alpha*ij));
        V[termi, termj] = part1*part2/(2*alpha);
      }
    }
    return (V*sigmasq);
  }
  
}

// The input data is a vector 'y' of length 'nterm'.
data {
  int<lower=0> nterm;
  int<lower=0> nreg;
  int<lower=0> nbranch[nterm];
  int<lower=0> nentries;
  matrix[nterm,nterm] branch_times;
  real epochs[nentries];
  real reg_profiles[nentries];
  
  vector[nterm] y;
}

parameters {
  real<lower=0, upper=1e100> alpha;
  real<lower=0, upper=1e100> sigmasq;
  vector[nreg] theta;
}

transformed parameters {
  vector[nterm] mu = computeWeights(nterm, nreg, alpha, nbranch, reg_profiles, epochs, theta);
  matrix[nterm, nterm] covar = computeCovars(nterm, alpha, branch_times, sigmasq);
}


model {
  target += multi_normal_lpdf(y | mu, covar);
}

I just wanted to check if the mean of the samples generated using stan function is similar to the MLE computed using optimizing function. It seems there is a huge difference. Interestingly the values from optimizing somewhat matches with my own implementation using R optim function.

fit = stan(file = 'EvoMCMC.stan', data = model_data, init = initf, verbose = TRUE, chains=nchain, iter=10000)
model_obj = stan_model(file='EvoMCMC.stan')
mle = optimizing(model_obj, data=model_data, init=initf)
mymle = myMLE(data, alpha=1, sigmasq=1, theta=rep(prior_theta_mean,1))

cat("\n**** my MLE ****\n")
print(setNames(mymle$par, c("alpha", "sigmasq", "theta")))
cat("\n**** rstan MLE ****\n")
print(mle$par[c('alpha', 'sigmasq', 'theta[1]')])
cat("\n**** rstan MCMC based solution ****\n")
print(setNames(c(mean(extract(fit)$alpha), mean(extract(fit)$sigmasq), mean(extract(fit)$theta)),
               c("alpha", "sigmasq", "theta")))

Gives the following result:

**** my MLE ****
      alpha     sigmasq       theta
  0.5520956 214.2985145  10.0782889

**** rstan MLE ****
     alpha    sigmasq   theta[1]
  0.554084 214.682116  10.098615

**** rstan MCMC based solution ****
       alpha      sigmasq        theta
5.593688e+97 6.614111e+99 7.592474e+00

Could somebody provide me some suggestions on what is going wrong here?

Thank you.

The posterior mean is not always near the maximum likelihood estimate so some mismatch is expected but in this case I think the sampler is failing due to bad parametrization. The declared upper bound 10^{100} creates the expectation that alpha and sigmasq are about 5\times 10^{99} and the sampler does not run long enough to learn the true scale. The starting scale is so large that the finite precision of floating-point arithmetic might stop the sampler completely.

See if removing the bounds fixes the problem.

parameters {
  real<lower=0> alpha;
  real<lower=0> sigmasq;
  vector[nreg] theta;
}

Thank you, Niko, for the quick reply. However, removing upper bound did not help.

**** my MLE ****
      alpha     sigmasq       theta
  0.5520956 214.2985145  10.0782889

**** rstan MLE ****
    alpha   sigmasq  theta[1]
  0.55396 214.67428  10.09869

**** rstan MCMC based solution ****
        alpha       sigmasq         theta
1.016928e+306 1.201854e+308  7.681030e+00

However, I noticed the following warning messages. I will go through the links. However, do you have any suggestions based on the warnings:

Warning messages:
1: There were 3967 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
2: There were 397 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
3: Examine the pairs() plot to diagnose sampling problems

Ok, let’s take another look.
When alpha is large (say >10.0) the covariance matrix from computeCavars is very close to an identity matrix times \frac{\sigma_{sq}}{2\alpha}. If there’s even a small chance that alpha is large then arbitrarily large values of alpha and sigmasq are consistent with the data. Without an explicit prior the implicit prior is improper.
The model needs a weakly-informative prior, e.g.

alpha ~ exponential(0.1);
1 Like

Yes. It helped. With the explicit prior,

**** rstan MLE ****
       alpha      sigmasq     theta[1]
1.636154e-03 1.797284e+02 1.043188e+01

**** rstan MCMC based solution ****
      alpha     sigmasq       theta
  14.566179 4181.655701    8.562174

Any suggestions on the priors for sigmasq? Also parameters of exponential?
Thank you.

The prior depends on the context. For positive values a reasonable starting point is ~ exponential(1.0/L) where L is the largest plausible value.

Are there more warnings?

No. Not a single warning. Also I notice that it has become faster 2 sec instead of 60 sec.