FItting a model with random effects and correlated noise - and not doing so well

I have generated some data assuming random effects and correlated noise (error) that happen to be auto-regressive in this case. What I am showing here is a slightly simplified version of what I am actually trying to do, but I am getting similar warnings - namely that ESS is too low and I have poor mixing.

I initially fit the models using brms - specifying the random effects and ar-1 correlation. Things worked out pretty well there. I’ve looked at the stan code and was able to fit a model directly in stan by replicating the code - no surprise there. But I wanted to see if I could develop some code that was a bit more general and might allow me to specify more flexible error covariance patterns. So, I have varied things a bit. In general, I think the estimates are correct, but clearly the approach is not super robust as I often get the warnings (not always, depending on the data set or size of correlation, but often).

Here is my data generating process:

library(simstudy)

### Define data (using simstudy package)

defI <- defData(varname = "ieffect", formula = 0, variance = 3^2)

defc <- defCondition(condition = "period == 0", 
                     formula = "20 + ieffect", dist = "nonrandom")
defc <- defCondition(defc, condition = "period == 1",
                     formula = "50 + ieffect", dist = "nonrandom")
defc <- defCondition(defc, condition = "period == 2",
                     formula = "25 + ieffect", dist = "nonrandom")
defc <- defCondition(defc, condition = "period == 3",
                     formula = "25 + ieffect", dist = "nonrandom")

## Generate data (using simstudy package)

set.seed(38241)

dd <- genData(120, defI)

dd <- trtAssign(dd, grpName = "rx")
dp <- addPeriods(dd, nPeriods = 4)
dp <- addCondition(defc, dp, "mu")
dp[, sigma2 := 6^2]

dx <- addCorGen(dp, "id", nvars = 4, dist = "normal",
                param1 = "mu", param2 = "sigma2", 
                rho = 0.5, corstr = "ar1", cnames = "y")

dx <- genFactor(dx, "period")

Here is the stan code:

data {
  int<lower=1> K;              // number of predictors
  int<lower=0> N;              // number of records
  int<lower=1> I;              // number of ids
  int<lower=0> T;              // number of periods
  int<lower=1,upper=I> ii[N];  // individual id
  int  x[N];                   // period
  vector[T] y[I];              // matrix of outcomes
}

parameters {
  
  real mu[K];   
  vector[I] rani;               // individual level random effects

  real<lower=0> sigmab;        // random effect variance (sd)
  real<lower=0> sigmae;        // individual level varianc (sd)
  real<lower=-1,upper=1> rho;  // correlation

}

transformed parameters{ 
  
  cov_matrix[T] Sigma;
  
  {
    real sig2 = sigmae^2;
    
    for (j in 1:(T-1))
      for (k in (j+1):T) {
        Sigma[j, k] = sig2 * pow(rho, abs(j - k));  
        Sigma[k, j] = Sigma[j, k];
      }
  
    for (i in 1:T)
      Sigma[i, i] = sig2;
    
  }
}

model {
  
  vector[N] yhat;
  vector[T] yhatM[I];
  
  for (i in 1:N)  
    yhat[i] = mu[x[i]] + rani[ii[i]];
  
  for (i in 1:I) {
    int start = i*4 - 3;
    yhatM[i] = yhat[start:(start+3)];
  }
  
  target += normal_lpdf(mu | 0, 30);
  target += student_t_lpdf(sigmab | 3, 0, 15) 
              - 1 * student_t_lccdf(0 | 3, 0, 15);
  target += cauchy_lpdf(sigmae | 0, 2) 
              - 1 * cauchy_lccdf(0 | 0, 2);
  target += uniform_lpdf(rho | -1, 1);
  
  target += normal_lpdf(rani| 0, sigmab);
  target += multi_normal_lpdf(y  | yhatM, Sigma);
  
}

And finally, here is the call to stan:

library(rstan)
options(mc.cores = parallel::detectCores())

x <- dx$period + 1 
K <- length(dx[, unique(period)])
N <- dx[, length(unique(timeID))]
I <- dx[, length(unique(id))]
T <- dx[, length(unique(period))]
ii <- dx[, id]
y <- matrix(dx[, y], nrow = I, ncol = T, byrow = TRUE)

testdat <- list(K=K, N=N, I=I, T=T, ii=ii, x=x, y=y)

rt <- stanc("Programs/RepeatedCorNoRx.stan")
sm <- stan_model(stanc_ret = rt, verbose=FALSE)
fit.norx <- sampling(sm, data=testdat, seed = 821, 
                   iter = 5000, warmup = 1000,
                   control = list(max_treedepth = 15))

print(fit.norx, pars = c("mu", "rho", "sigmae", "sigmab"))

Here are the estimates:

Inference for Stan model: RepeatedCorNoRx.
4 chains, each with iter=5000; warmup=1000; thin=1; 
post-warmup draws per chain=4000, total post-warmup draws=16000.

        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu[1]  20.25    0.01 0.62 19.02 19.84 20.26 20.67 21.44  3672 1.00
mu[2]  49.34    0.01 0.62 48.11 48.93 49.35 49.75 50.55  3171 1.00
mu[3]  24.57    0.01 0.62 23.36 24.16 24.57 24.99 25.77  3102 1.00
mu[4]  24.18    0.01 0.62 22.97 23.76 24.18 24.59 25.37  3182 1.00
rho     0.56    0.00 0.05  0.45  0.53  0.57  0.60  0.65   507 1.01
sigmae  6.46    0.02 0.34  5.77  6.24  6.45  6.68  7.13   385 1.01
sigmab  1.29    0.11 0.86  0.13  0.54  1.17  1.90  3.12    64 1.09

Samples were drawn using NUTS(diag_e) at Fri Sep 13 13:29:00 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

But - this is the warning I get:

Warning messages:
1: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
2: Examine the pairs() plot to diagnose sampling problems
 
3: The largest R-hat is 1.13, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

Clearly, I am doing something funky, and was wondering if there is some obvious mis-step I am making. Any advice would be much appreciated.

I haven’t tried to run your code, but I don’t see explicit priors on sigmab, sigmae, or rho. The implicit flat prior on rho might be OK, but I suspect the flat priors on sigmab and sigmae are problematic. What happens if you include the following in your model block:

sigmab ~ exponential(1);
sigmae ~ exponential(1);
rho ~ uniform(-1, 1);

Kent

Thanks for taking a look. I did specify the priors based on what brms had generated for the standard deviation:

 target += student_t_lpdf(sigmab | 3, 0, 15) 
              - 1 * student_t_lccdf(0 | 3, 0, 15);
 target += cauchy_lpdf(sigmae | 0, 2) 
              - 1 * cauchy_lccdf(0 | 0, 2);
 target += uniform_lpdf(rho | -1, 1);

I went ahead and tried the exponential(1) prior and the results were similar to what I showed above, also with warnings.

  • Keith

Oh, I see now. I don’t usually specify the priors that way, so I missed it. Since I almost never work with target I’m not good with the associated syntax. Are you sure that you need the - 1 * student_t_lccdf(0 | 3, 0, 15) and the corresponding term in the next line? You’ve declared sigmab and sigmae with a lower bound of 0. If you wrote

sigmab ~ student_t(3, 0, 15);
sigmae ~ cauchy(0, 2); 

I don’t think you’d have those terms in target. As I said though, I almost never manipulate target directly. I depend on the sampling statements to do it for me, so I could be completely off base.

Kent

Thanks again. I adopted this particular syntax from the brms-generated code, and it seems to make sense. The fact that when I used the exponential and got the same results leads me to believe that is not the issue.