Can Bayesian predict cumulative-quantity?

there is great difference

1 Like

Well I guess this is a simple question but I actually have about zero experience. Here is the situation as I understand it though:

  1. Selectively leaving out data points is the path to tricking ourselves about our model fit

  2. If there are outliers, our first intention should be to think about why that point might be so different from the rest and model it

  3. If we do not have the data to model this (extra covariates or whatever that could be driving this), then we’ll just need to model outliers by changing our likelihood to something else. A student-t is the usual thing. Search for “outliers” here: http://sumsar.net/files/posts/2017-01-15-bayesian-computation-with-stan-and-farmer-jons/stan_exercise.html . I think that explanation seems fair.

2 Likes

Regardly outliers, there’s a post on the Gelblog if you’re bored: https://statmodeling.stat.columbia.edu/2020/05/01/hey-you-yeah-you-stop-what-youre-doing-right-now-and-read-this-stigler-article-on-the-history-of-robust-statistics/

It’s on robustness, but whenever outliers come up so does robust regression. The Gelblog can be super confusing to read, but there are links to previous discussions on robustness and whatnot. It can be interesting to skim just to see what people are vaguely talking about.

2 Likes

Thank you very much for your kind guidance.
i am overwhelmed with appreciation and inspired to continue learning,although the road is long.

Just to add a small thing for perspective - there is relatively big literature trying to discover what is a good response/distribution to model citation counts and they mostly agree that negative binomial is too concentrated and doesn’t do a good job at capturing the rare papers with very huge citation counts. This is possibly what is happening with the year 2018 in your data - some paper got cited hugely.

The sad part is that Stan doesn’t have any of those less known, so called “overdispersed” distributions implemented (and I am actually slowly working on implementing them, but it turns out it is not trivial). So you can’t easily improve your model much. The best you can do to model this “overdispersion” easily would be to add a varying intercept for each measurement (e.g., + (1 || year)).

If you want to go deeper, a good reference is IMHO the paper on the Generalized inverse-Gaussian Poisson distribution (also called the Sichel distribution) which references some previous work and is referenced by others. But this might be just a distraction - your model seems to work good enough for many purposes.

The only package I am aware of that allows for fitting models using the Sichel distribution is gamlss

Best of luck with your Stan adventures! :-)

4 Likes

@martinmodrak, thank you for your advice. I recode the multilevel model and it seemed to work well

data {
  int<lower=1> N;                // number of observations
  int Y[N];                      // response variable
  real X[N];                     // predictor variable
  int<lower=1> M;                // number of grouping levels
  int<lower=1, upper=M> J[N];    // grouping indicator per observation
}

transformed data {
 real Xc[N];
 real means_X = mean(X);
 for (i in 1:N) {
    Xc[i] = X[i] - means_X;
  }
}


parameters {
  real beta;                  // fixed slope
  real Intercept;             // fixed intercept
  vector[M] u;                // varying intercept
  real<lower=0> phi;          // shape parameter
}


model {
  real mu;
  
  target += student_t_lpdf(Intercept | 3, 5, 10);
  target += gamma_lpdf(phi | 0.01, 0.01);
  target += normal_lpdf(u | 0, 1);
  target += normal_lpdf(beta | 0, 10);

  for (n in 1:N) {
    mu = Intercept + u[J[n]] + Xc[n] * beta;
    Y[n] ~ neg_binomial_2_log(mu, phi);
  }
 
}


generated quantities {
// actual population-level intercept
  real b_Intercept = Intercept - means_X * beta;
   
}


and then

stan_dat <- list(
  N = nrow(d_eng),
  Y = d_eng$n_cited,
  X = d_eng$year,
  M = nlevels(as.factor(d_eng$year)),
  J = as.integer(as.factor(d_eng$year))
)


fit <- stan(
            file = "nb_model.stan", 
        	data = stan_dat,
			iter = 4000, 
			chains = 4, 
			warmup = 2000, 
			cores = 4,
			control = list(
				adapt_delta = 0.99,
				max_treedepth = 15
			))

fit

stan_hist(fit, pars = c("b_Intercept", "beta", "phi"))
stan_hist(fit, pars = c("u"))

But now I have two new problems. In this Stan model

  1. I don’t know how to do Posterior Predictive Checks, like this?
generated quantities {

  int y_rep[N]; 
  for (n in 1:N) {
   y_rep[n] = neg_binomial_2_rng(exp(Intercept + u[J[n]] + X[n] * beta), phi);
  }
    
}
  1. In multilevel model,how to predict given new year, for example 2020.

Yes, that looks like a good basis for PP checks.

If year has a varying effect/intercept, the most common option is to sample the hyperprior…

Nevertheless, your model doesn’t really look like a multilevel model, since you don’t have any “hyperprior”, a classical multilevel model would have a parameter (say tau) that works like:

parameters {
  real tau;
  vector[M] u_raw; 
 // .... and the rest as you have it
}

transformed parameters {
  vector[M] u = u_raw * tau; 
}

model {
  target += normal_lpdf(tau | 0, 1);
  target += normal_lpdf(u_raw | 0, 1);
 // .... and the rest as you have it
}

Anyway, I am slightly puzzled why you opted to move away from brms as the model you are using is well within its capabilities. Obviously if your goal is to get better at Stan, than this is a good approach :-)

1 Like

I think Stan can help me understand Bayes better, although I only know a little

1 Like

I am curious how this works.

It might be more straightforward in the end, but Stan code is fun!

2 Likes

Not sure what you’re trying to say - is there a mistake in my code/explanation? This is well possible, but I can’t see it, so I’d need more details :-)

1 Like

I am just curious to see the difference. I agree with everything you said!

1 Like