as a personal experiment, I would like to submit this article to the community before I finalize it and send it to open archive and then peer-reviewed journal.

If you feel familiar with one or more aspects of the study feel free to comment here or directly the document.

I am happy to acknowledge or include contributors into the authorship if they wish so and (for authored) if the contribution is mildly substantial.

I have some comments but need a few days to write them.
EDIT: I didn’t notice this was 3 months old post as it was high in the topic list (@martinmodrak had changed the tag, so it bumped up). I guess you don’t need comments anymore.

I also would be happy to get Aki’s comments on it. Also a technicality: it turns out changing tags doesn’t bump a post up, but changing categories (which I did here) does. So we don’t need to worry about editing tags on existing posts, it won’t spam the front page :-)

I’ve wondered about that. There’s a lot of posts in developers that I kept moving to interfaces or modeling (people just thought it was the way to contact the devs), but I stopped because I didn’t know what that did to everyone else in the interface. Should I resume?

This is very relevant for me now as I’m working on splice variant differential expression with biological replicates with @schen5. We’ve set up pretty traditional posterior predictive checks and have also found that the variational inference works pretty well.

We show using a publicly available data set where nearly 10% of differentially abundant transcripts had fold change inflated by the presence of outliers.

What’s are you considering an outlier in this situation? The choice of 2.5% and 97.5% interval seems arbitrary, especially as you expect non-outliers in that range with hundreds of data points.

I’ve been heavily influenced by @andrewgelman not to think in terms of outliers at all, but in terms of building some kind of measurement error model. I’ve also been influenced by Andrew not to think in terms of FDR, which has this implicit quantization threshold, but in terms of posterior coverage. That is, if the method says the differential expression is in a given interval 50% of the time, how often is it in that interval? This requires setting the intervals to test (50% is nice for power), but not setting a threshold.

I fear in practice that the biologists don’t care about any of this and are just using all this for ranking hypotheses to take into the wet lab.

one outlier for one randomly selected sample, characterised by a 10-10 quantile distance

I don’t know what that means, either. What’s a quantile distance?

As the amount of draws from the posterior probability distribution needed to define extreme quantiles of a distribution grow exponentially with the number of samples and the false pos- itive rate,

I glanced through the paper and don’t know what is supposed to be growing exponentially as a function of what. The amount of data required to estimate a quantile q in (0, 1) is roughly 1/(min(q, 1 - q)) times the amount of data you need to estimate a median.

Just regarding outliers: something came up in class the other day, and I was reminded that outliers cause two problems: first, these are data points that the model does not correctly fit; second, if the model is fit to data with outliers, but the model doesn’t account for the outliers, then we can get bad answers because the outliers pollute the rest of the inference. This is well known in statistics (the whole robustness literature) but I think that sometimes people forget about it when thinking about outliers.

@Bob_Carpenter thanks for the feedback! I will reply today later as it is a bit more involved.

Thanks @andrewgelman we tackle this problem with an iterative strategy (2 steps it woks well, but it could be more) where a first inference is done with outliers and those are excluded with conservative criteria from the second inference, where we get a probability measure of a data point being an outlier. This second inference is done using a truncated distribution based on CI boundaries we used to define outliers in the first phase.

There is not much bibliography it is true. I will try to get more into outlier detection and enrich the bibliography. If you feel to point me to some representative article would be awesome.

We label outliers based on a (calibrated) false positive rate (expressed in %), and this takes into account multiple testing (I opened another can of worms I know XD )

I would really like to explore more this approach (I guess going beyond p-value thresholds), however I have to admit I don’t understand it so much. A more concrete numerical example would help a lot.

This is true, I have to say that with thousands of hypothesis tests (for the whole genome for example) scientists need a really simple way to express probability of importance to take decisions. This is an interesting topic, how to avoid thresholds yet keeping simple the communication of relevance/uncertainty

It’s a bit confusing. I will try to explain better in the article. Basically in the data simulation for validation I created outliers that are in the extreme tails of a gene distribution (inferred from real biological data), for example if a gene abundance was a standard normal I introduce outliers > 6.25. Does it make sense?

Thanks for this. This is a rather generic sentence of mine, for lack of real knowledge. This is the equation I needed! 1/(min(q, 1 - q)) times the amount of data you need to estimate a median. Is this introduced in some specific book/article?