Different results for summary(fit) and monitor(as.array(fit)) in rstan

I’m recieving slightly different estimates from the pooled chain output from the summary() function, summary(fit)$summary, compared to using monitor(as.array(fit), digits_summary = 7).

I might be mistaken in expecting these two outputs to be the same but can’t think of any obvious reason that they wouldn’t be (the monitor() method returns calculated Rhat values so must also be using all chains pooled).

Differences are minor but enough to make me wonder what’s going on. As an example, below is the output from the 8schools model for mu.

> summary(fit)$summary["mu",]
       mean     se_mean          sd        2.5%         25%         50%         75% 
  7.7698073   0.2223414   5.5400738  -3.1682275   4.6139056   7.5949953  11.2904252 
      97.5%       n_eff        Rhat 
 18.1887544 620.8554645   1.0004004 

> monitor(as.array(fit), digits_summary = 7, print = F)["mu",]
       mean     se_mean          sd        2.5%         25%         50%         75% 
  7.6688573   0.2119141   5.1142464  -2.5261489   4.3962322   7.5022248  11.0908054 
      97.5%       n_eff        Rhat 
 17.4937133 582.4295875   1.0058478 

Any ideas why the difference occurs?

Not sure, does monitor(as.array(fit)) include the warmup samples?

Figured it out, right line of thinking Stephen but slightly different issue.

as.array() takes the number of warmups from the stanfit object and drops this number from the start of the chain (no arguments to change this behaviour for as.array). monitor() by default drops the first half of the chains fed to it. So using default arguments (and a warmup of n_samples/2), half the samples are (correctly) droppped by summary(), but 75% of samples are (incorrectly) dropped using monitor(as.array()).

Easy fix is to set the monitor function to remove 0 warmup samples as below. Turns out the as.array() is entirely superflous here and just using monitor directly on the stanfit object will remove the correct number of samples (shown at the bottom).

> summary(fit)$summary["mu",]
       mean     se_mean          sd        2.5%         25%         50%         75% 
  7.7698073   0.2223414   5.5400738  -3.1682275   4.6139056   7.5949953  11.2904252 
      97.5%       n_eff        Rhat 
 18.1887544 620.8554645   1.0004004 

> monitor(as.array(fit), digits_summary = 7, print = F, warmup = 0)["mu",]
       mean     se_mean          sd        2.5%         25%         50%         75% 
  7.7698073   0.2218864   5.5400738  -3.1682275   4.6139056   7.5949953  11.2904252 
      97.5%       n_eff        Rhat 
 18.1887544 623.4044393   1.0004004 

> monitor(fit, digits_summary = 7, print = F)["mu",]
       mean     se_mean          sd        2.5%         25%         50%         75% 
  7.7698073   0.2218864   5.5400738  -3.1682275   4.6139056   7.5949953  11.2904252 
      97.5%       n_eff        Rhat 
 18.1887544 623.4044393   1.0004004 

This gets results very close but with some remaining funny behaviour. As seen above all estimates are equal for monitor() and summary() except se_mean and n_eff which still differ slightly (this is the case for all model parameters).

se difference is at the third decimal place and n_eff difference is similarly small but it would be nice to know where the difference is coming from. Similarly, is there a preferred function for extracting mean, quantiles, etc. from stanfit objects? I’ll default to summary() for the time being as it’s the more widely used.

Looks like the issue has been hit on before here if you look at the output from print() and the m2 output, but without resolution.