mydata.txt (193.9 KB)
mypriors.txt (592 Bytes)
I’ve been super impressed with the speedups offered by cmdstan, as well as the fact that it hasn’t crashed on me even once…
But unfortunately, there seems to be something wrong with the way priors work in cmdstan
. Basically, they’re not shrinking the posterior means the way they should. My anonymized dataset (attached) has a quaternary categorical response y, and 28 predictors x_{1} \dots x_{28}. Most predictors have N(0, 2.5) priors (see mypriors.txt, attached). The fixed-effect parameters muB_x19B and muC_x24E have maximum-likelihood estimates of negative infinity, because predictors x19B and x24E have sampling zeroes for response categories B and C, respectively. Hence, a frequentist model “converges”, after a large number of iterations, to logits of -14.45 and -10.5 for those two parameters:
TestMod.freq <- nnet::multinom(y ~ ., data = mydata, maxit = 5000)
coef(TestMod.freq)
But the likelihood is quite flat at these estimates because the associated predictor values have few observations. Hence, the N(0, 2.5) prior should shrink the posterior mean to a reasonable single-digit value. And in regular rstan, it indeed does:
TestMod.stan <- brm(y ~ ., family = categorical, prior = mypriors, data = mydata,
chains = 2, cores = 2, warmup = 1000, iter = 6000, seed = 2022, control = list(adapt_delta = 0.90))
summary(TestMod.stan)
^ The posteriors of muB_x19B and muC_24E are now centered around -2, just with wider spreads than the other parameters. The priors have done their job beautifully.
BUT, look what happens when I try the same using cmdstanr
:
TestMod.cmdstan <- brm(bf(y ~ ., decomp = "QR"), family = categorical, prior = mypriors, data = mydata,
threads = threading(2), chains = 2, cores = 4, warmup = 1000, iter = 3500, seed = 2022, control = list(adapt_delta = 0.90), backend = "cmdstanr")
summary(TestMod.cmdstan)
^ The priors are no longer working as expected (if at all). The posterior means are even farther from zero than in the frequentist model. It is obvious that the N(0, 2.5) prior is imposing very little, if indeed any, shrinkage. This is despite the fact that the priors seem to have been “read” correctly. Calling prior_summary(TestMod.cmdstan)
displays the correct N(0, 2.5) priors on the unruly parameters.
There just seems to be something wrong with how cmdstan interprets, or applies, those priors.
This is very unfortunate because I have been otherwise enamored with the speed and stability improvements offered by cmdstan over rstan. Is any quick fix possible?