Kolmogorov smirnov test on model output

I installed cmdstan. I ran a code related to the truncated Weibull distribution using cmdstan. Could you please tell me how I evaluate the model using the Kolmogorov-Smirnov test?

I used the following code, but it didn’t work.
ks_test_result ← ks.test(data, “ptweibull”, shape = 1.94, scale = 7.89)

Hi @Saima I moved this post to a new topic.

I don’t want to be too presumptive here, and it is possible to use a ks test statistic as a discrepancy function in posterior predictive checking of a Bayesian model fit, but my best guess is that you are trying to do something that doesn’t really make sense in a Bayesian context. In order for us to help you further, can you share:

  • a thorough description of what you are trying to achieve
  • your full R script for doing so (ideally with some fake data simulation so it’s reproducible)
  • comments on what the R script is doing (for example, where did you get shape = 1.94, scale = 7.89 from?)
  • a complete description of the output (pasting the actual output is ok)
  • a clear statement of what part of the output leads you to conclude that “it didn’t work”
1 Like

I am trying to determine which model is better, whether it is the Weibull model or the truncated Weibull model, using the Kolmogorov-Smirnov test. I use the following code:

dataaa <- c(7.40, 6.50, 3.80, 4.3, 4.5, 2, 5.30, 8.20, 9, 6.5, 5, 5.2, 10, 3.2, 3.5, 5.2, 3.1)
length(dataaa)
# estimate Weibull parameters
library('EnvStats')
eweibull(dataaa, method = "mle")
# I find shape = 2.73 and scale = 6.14.
# Fitting truncated Weibull
library(cmdstanr)
library(ggplot2)
set.seed(20220118)
N <- 17       # Number of observations
shape <- 2.73   # shape parameter of Weibull distribution
scale <- 6.14  # scale parameter of Weibull distribution
L <- 5         # truncation point
D <- rweibull(N, shape, scale)
Dt <- D[D > L]
ggplot(data.frame(D = D, trunc = (D <= L))) +
  geom_histogram(aes(x = D, fill = trunc), binwidth = 1, boundary = 0)  
set_cmdstan_path(path = NULL)
cmdstan_path()
model1 <- cmdstan_model("weibull_truncated.stan")
fit1 <- model1$sample(list(N = length(Dt), L = L, Y = Dt),
                      seed = 1,
                      iter_warmup = 2000, iter_sampling = 2000,
                      refresh = 400,
                      chains = 4, parallel_chains = 4)
fit1$summary(c("alpha", "sigma"))

# Using a truncated weibull, I find shape = 2.74, scale =6.21.
#Perform KS test for Weibull test
ks_test_result <- ks.test(dataaa, "pweibull", shape = 2.73, scale = 6.14) #Perform a KS test comparing the data to a Weibull distribution with the estimated parameters

But I couldn’t use KS test for truncated Weibull.
I am looking forward for your help.

Edited by @jsocolar for R syntax highlighting.

I’m still not sure exactly what might help you make progress, but I hope something in the below helps.

  • to perform a ks test against a truncated weibull, you can write down the truncated weibull pdf as a function and pass that to ks.test.
  • this doesn’t look like a good way to compare the two models, for many reasons, including:
    • the models are being fit to different data
    • you are representing the output of the Bayesian model via a point estimate of its parameters rather than integrating over the posterior
    • the ks test statistics and p-values don’t tell you anything about which model has better predictive performance (even if the models were fit to the same data)
    • the differences in the ks test p values don’t indicate that one model fits better than another unless one and only one model has a very low p-value. Even if both models are perfectly well specified and recovered exactly the right parameter values, you’d expect their ks test p values to vary (ideally uniformly on the unit interval) such that the two p-values might be quite different.
    • even neglecting all the above issues, comparing ks test p-values wouldn’t be meaningful since the models are fit to datasets of different sizes.
    • the truncated weibull model will always fit any observed data a little bit better than the untruncated model, provided that the observed data don’t violate the truncation bound (if they do violate the bound, then the untruncated model fits infinitely better).
  • Stan integrates well with a robust ecosystem in R for evaluating the quality of Bayesian model fits and comparing models. If you want to do model comparison to assess the expected predictive performance of two Bayesian model fits on held-out data, I suggest you look into the loo package.

Thanks for your valuable suggestions.
I read articles where the Weibull model and the truncated model are compared in terms of AIC, RMSE, KS, and R square. That’s why I am trying to compare models using KS.
Now coming to your first point, how could I write down the truncated Weibull pdf as a function and pass that to ks. test?

d_trunc_weibull <- function(x, shape, scale, lower = 0, upper = Inf) {
  assertthat::assert_that(upper > lower)
  assertthat::assert_that(lower >= 0)
  dweibull(x, shape, scale) / (pweibull(ub, shape, scale) - pweibull(lb, shape, scale))
}
1 Like