To summarize the current status, here are things that I think have been causing flakiness:
- My original change to the old jobs to allow testing against pull requests on forks has encountered a couple of corner cases so far that have caused spurious failures.
- Adding the linux box back in and trying to figure out how to use it without it imploding / its network connection dropping. I think it’s in a pretty good state for the past week or so, finally.
- Github going down (semi-rare but I’ve seen it a few times).
- Two jobs running simultaneously that conflict in ways I don’t totally understand
There might be more I’m missing - anyone have others?
I think the vast majority have been due to #2, and pipelines are supposed to give us better job isolation (dealing with #4), better robustness in the face of node failure (#2), and tools to add retries etc. to help deal with things like #3. I think the mechanism behind #1 is a little better in pipeline land as well as its getting its parameters from a plugin with commercial support that seems a little more robust than the old “Github Pull Request Builder” plugin.
Daniel, what kind of stuff could we simplify or coalesce to add robustness or save time, respectively?
I don’t think my experimentation with pipelines have been affecting the other jobs, other than that they are also pull requests being tested and thus adding testing load.