Inefficient samples, non-stationary traceplot

sebastiansauer · August 31, 2018, 8:43am

I’m trying to fit a simple linear model using the rethinking package (draws on rstan MCMC).

The model fits, but the sampling is inefficient and Rhat indicates something went wrong. I don’t understand the reason for this fitting problem.

This is the data:

d <- read_csv("https://gist.githubusercontent.com/sebastiansauer/c369e535a536c775fbffc4864bd2ad67/raw/election_data.csv")
d <- as.data.frame(d)
d <- d[, c("afd_prop",  "state_id")]

This is the model:

m13_stan <- map2stan(
  alist(
    afd_prop ~ dnorm(mu, sigma),    
    mu <- alpha + beta0[state_id],
    beta0[state_id] ~ dnorm(a, sigma),
    a ~ dnorm(0, 1),
    sigma ~ dnorm(0, .1),
    alpha ~ dnorm(0, .1)
  ),
  data = d,
  iter = 4000
)

Rhat seems ok (1.01). But the number of effective samples is low (~60).

The traceplot is not stationary.

Sorry, this is probably a beginner’s question - but I’m unsure why the sampling is so ineffective and what I can do about it.

Thanks for any help.

ldeschamps · August 31, 2018, 9:39am

Hello!

You are using the same parameter, “sigma”, to model both the noise (afd_prop ~ dnorm(mu, sigma)) and the standard deviation of your group intercepts (beta0[state_id] ~ dnorm(a, sigma)). However, this is two different kind of variability in your data, the former estimating the residual variability after accounting for the differences between groups.

Moreover, your priors seem pretty restrictive (try curve(dnorm(x, 0, 0.1), from = -1, to=1)). The normal distribution in stan is parametrized by its standard-deviation, not its precision (sd = 1/ precision).

Have a nice day!
Lucas

sebastiansauer · August 31, 2018, 12:10pm

Hi Lucas,

Many thanks for your comments. So right!

Now Rhat seems ok (1.01), but the number of effective samples for the beta[state_id] parameters is still very low (all 19). The traceplot is still not stationary. It seems that there’s still something wrong in the woodshed.

Any thoughts are greatly appreciated.

Many thanks,
Sebastian

ldeschamps · August 31, 2018, 12:11pm

Could you send the new code? It might help understanding problematic behaviors!

Lucas

sebastiansauer · August 31, 2018, 7:30pm

UPDATE: I think i found the issue: beta0 was defined as a unique intercept for each group. In the data frame, the variable east was coded as 0 or 1. Apparently, the 0 caused error. With this updated code, the model seems to work properly:

d$east_cat <- ifelse(d$east == 1, "yes", "no")

m9_stan <- map2stan(
  alist(
    afd_prop ~ dnorm(mu, sigma),
    mu <- alpha + beta0[east_id] +  beta1*for_prop_z + beta2*unemp_prop_z,
    beta0[east_id] ~ dnorm(0, 1),
    alpha ~ dnorm(0, 1),
    beta1 ~ dnorm(0, 1),
    beta2 ~ dnorm(0, 1),
    sigma ~ dnorm(0, 1)
  ),
  data = d)

n_eff is good now, Rhat too, and the traceplot looks more like a fat, hairy caterpillar.

Thanks for your help!

ldeschamps · August 31, 2018, 8:36pm

I tried to fit your model, and I think you do not need the a parameter. Indeed, in this formulation, beta0 represents a deviation around the global intercept, an absence of group effect being represented by beta0 = 0. Thus, it is unnecessary to estimate the mean of group effect’s distribution (it would quite naturally be 0, some group being higher and others being lower), and this is why you observe problematic sampling behaviors.

Writing a model as follow allows to estimate how much the group effects vary compared to an absence of group effect. It has the wonderfull consequence of limiting overfitting, the hierarchical prior preventing group effect to go too far from zero. Doing otherwise would allow model to fit to strongly too your particular dataset, and would reduce its ability to predict new data from the same generating process.

Technically, you will see that the n_eff are much better!

m13_stan <- map2stan(
  alist(
    afd_prop ~ dnorm(mu, sigma),    
    mu <- alpha + beta0[state_id],
    beta0[state_id] ~ dnorm(0, sigma2),
    sigma ~ dnorm(0, 1),
    sigma2 ~ dnorm(0, 1),
    alpha ~ dnorm(0, 1)
  ),
  data = d
)

Because you are modelling proportions, it would be better to use a beta distribution as likelihood. However, you have to use the reparametrization described in classical papers on beta regressions (0<mu<1, 0<phi). I am not sure how to do this with map2stan, I tried something but it did not work well. It is necessary to put constraints on parameters and play with returned ones, but I do not work with map2stan!

EDIT : Reparametrization is not mandatory, but makes the interpretation of parameters far more easier!

m14_stan <- map2stan(
  alist(
    afd_prop ~ dbeta(mu*exp(phi), (1-mu)*exp(phi)),    
    logit(mu) <- alpha + beta0[state_id],
    beta0[state_id] ~ dnorm(0, sigma2),
    phi ~ dnorm(0, 1),
    sigma2 ~ dnorm(0, 1),
    alpha ~ dnorm(0, 1)
  ),
  data = d
)

Do not worry about negative WAIC, nothing constrains it to be positive. The importance is the difference between models!

sebastiansauer · September 3, 2018, 7:39am

You helped me so much - saved me several hours. Many thanks!

ldeschamps · September 3, 2018, 8:28am

You’re welcome!

Topic		Replies	Views
Trace plot and mcmc diagnostics Modeling fitting-issues	4	2769	December 25, 2018
Inefficient sampling with small experimental uncertainties Modeling performance	6	682	February 8, 2018
Shinystan trace plot General	3	584	February 7, 2021
Huge RHATs, Extreme Model Performance Variability Depending on Priors Modeling	10	891	September 23, 2020
Draws from the Posterior Do Not Vary Modeling	2	485	June 16, 2019

Inefficient samples, non-stationary traceplot

Related topics