Rstan results from MLE (optimizing) and MCMC (stan) do not match

soumitrakp · October 18, 2020, 10:59am

Hi, I have a model which basically generates multi-variate gaussians where the mean vector and the covariance matrix are dynamically computed using some custom functions as shown bellow:

functions {
  vector computeWeights(int nterm, int nreg, real alpha, int[] nbranch, real[] reg_profile, real[] epochs, vector theta) {
    int index = 1;
    real y[nterm];
    matrix[nterm, nreg] W;
    for (term in 1:nterm) {
      int nb = nbranch[term];
      for (br in 1:nb) {
        real t = epochs[index] - epochs[index+br-1];
        y[br] = exp(-alpha*t);
      }
      for (br in 1:(nb-1)) {
        y[br] = y[br] - y[br+1];
      }
      for (reg in 1:nreg) {
        real dotp = 0.0;
        for (br in 1:nb) {
          dotp += y[br]*(reg_profile[index+br-1] == reg ? 1 : 0);
        }
        W[term,reg] = dotp;
      }
      index += nb;
    }
    return (W*theta);
  }
  
  matrix computeCovars(int nterm, real alpha, matrix bt, real sigmasq) {
    matrix[nterm, nterm] V;
    for (termi in 1:nterm) {
      for (termj in 1:nterm) {
        real ii = bt[termi, termi];
        real jj = bt[termj, termj];
        real ij = bt[termi, termj];
        real part1 = exp(alpha*(-ii - jj + 2*ij));
        real part2 = (1-exp(-2*alpha*ij));
        V[termi, termj] = part1*part2/(2*alpha);
      }
    }
    return (V*sigmasq);
  }
  
}

// The input data is a vector 'y' of length 'nterm'.
data {
  int<lower=0> nterm;
  int<lower=0> nreg;
  int<lower=0> nbranch[nterm];
  int<lower=0> nentries;
  matrix[nterm,nterm] branch_times;
  real epochs[nentries];
  real reg_profiles[nentries];
  
  vector[nterm] y;
}

parameters {
  real<lower=0, upper=1e100> alpha;
  real<lower=0, upper=1e100> sigmasq;
  vector[nreg] theta;
}

transformed parameters {
  vector[nterm] mu = computeWeights(nterm, nreg, alpha, nbranch, reg_profiles, epochs, theta);
  matrix[nterm, nterm] covar = computeCovars(nterm, alpha, branch_times, sigmasq);
}


model {
  target += multi_normal_lpdf(y | mu, covar);
}

I just wanted to check if the mean of the samples generated using stan function is similar to the MLE computed using optimizing function. It seems there is a huge difference. Interestingly the values from optimizing somewhat matches with my own implementation using R optim function.

fit = stan(file = 'EvoMCMC.stan', data = model_data, init = initf, verbose = TRUE, chains=nchain, iter=10000)
model_obj = stan_model(file='EvoMCMC.stan')
mle = optimizing(model_obj, data=model_data, init=initf)
mymle = myMLE(data, alpha=1, sigmasq=1, theta=rep(prior_theta_mean,1))

cat("\n**** my MLE ****\n")
print(setNames(mymle$par, c("alpha", "sigmasq", "theta")))
cat("\n**** rstan MLE ****\n")
print(mle$par[c('alpha', 'sigmasq', 'theta[1]')])
cat("\n**** rstan MCMC based solution ****\n")
print(setNames(c(mean(extract(fit)$alpha), mean(extract(fit)$sigmasq), mean(extract(fit)$theta)),
               c("alpha", "sigmasq", "theta")))

Gives the following result:

**** my MLE ****
      alpha     sigmasq       theta
  0.5520956 214.2985145  10.0782889

**** rstan MLE ****
     alpha    sigmasq   theta[1]
  0.554084 214.682116  10.098615

**** rstan MCMC based solution ****
       alpha      sigmasq        theta
5.593688e+97 6.614111e+99 7.592474e+00

Could somebody provide me some suggestions on what is going wrong here?

Thank you.

nhuurre · October 18, 2020, 11:28am

The posterior mean is not always near the maximum likelihood estimate so some mismatch is expected but in this case I think the sampler is failing due to bad parametrization. The declared upper bound 10^{100} creates the expectation that alpha and sigmasq are about 5\times 10^{99} and the sampler does not run long enough to learn the true scale. The starting scale is so large that the finite precision of floating-point arithmetic might stop the sampler completely.

See if removing the bounds fixes the problem.

parameters {
  real<lower=0> alpha;
  real<lower=0> sigmasq;
  vector[nreg] theta;
}

soumitrakp · October 18, 2020, 12:19pm

Thank you, Niko, for the quick reply. However, removing upper bound did not help.

**** my MLE ****
      alpha     sigmasq       theta
  0.5520956 214.2985145  10.0782889

**** rstan MLE ****
    alpha   sigmasq  theta[1]
  0.55396 214.67428  10.09869

**** rstan MCMC based solution ****
        alpha       sigmasq         theta
1.016928e+306 1.201854e+308  7.681030e+00

soumitrakp · October 18, 2020, 12:27pm

However, I noticed the following warning messages. I will go through the links. However, do you have any suggestions based on the warnings:

Warning messages:
1: There were 3967 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
2: There were 397 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
3: Examine the pairs() plot to diagnose sampling problems

nhuurre · October 18, 2020, 12:46pm

Ok, let’s take another look.
When alpha is large (say >10.0) the covariance matrix from computeCavars is very close to an identity matrix times \frac{\sigma_{sq}}{2\alpha}. If there’s even a small chance that alpha is large then arbitrarily large values of alpha and sigmasq are consistent with the data. Without an explicit prior the implicit prior is improper.
The model needs a weakly-informative prior, e.g.

alpha ~ exponential(0.1);

soumitrakp · October 18, 2020, 1:05pm

Yes. It helped. With the explicit prior,

**** rstan MLE ****
       alpha      sigmasq     theta[1]
1.636154e-03 1.797284e+02 1.043188e+01

**** rstan MCMC based solution ****
      alpha     sigmasq       theta
  14.566179 4181.655701    8.562174

Any suggestions on the priors for sigmasq? Also parameters of exponential?
Thank you.

nhuurre · October 18, 2020, 1:15pm

The prior depends on the context. For positive values a reasonable starting point is ~ exponential(1.0/L) where L is the largest plausible value.

Are there more warnings?

soumitrakp · October 18, 2020, 4:03pm

No. Not a single warning. Also I notice that it has become faster 2 sec instead of 60 sec.

Topic		Replies	Views
Gaussian Process Prediction Modeling rstan , techniques , fitting-issues	2	469	March 22, 2021
Point estimates Modeling	10	2400	September 26, 2017
Generated MVN not what I expect General	11	689	December 10, 2018
What's wrong with my stan code for estimating parameters with multivariate distribution Modeling	2	415	July 31, 2019
Using rstan::optimizing to obtain MLE does not seem to run with large number of parameters Modeling	12	1693	July 19, 2018

Rstan results from MLE (optimizing) and MCMC (stan) do not match

Related Topics