Retry flaky e2e tests at most 2 times #31682

kevin940726 · 2021-05-11T02:32:15Z

Description

Related to #33980.

It retries failed tests at most 2 times (3 times counting the initial run) for e2e tests. This is only possible after we migrated to jest-circus.

This could be a controversial one, but IMO it does unblock us from some flaky tests immediately, and makes our lives a bit easier to maintain the tests.

If the test failed 3 times in a row, then we should definitely fix it. We should probably still add a mark if a test failed at least one time though, so that we can take a look at them early.

This is only enabled in CI environment (GitHub actions), so that in local testing we can still be alerted if something failed as soon as possible.

How has this been tested?

Intentionally update an e2e test to fail intermittently and observe that it runs at most 3 times until it passes.

Types of changes

New feature

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
I've tested my changes with keyboard and screen readers.
My code has proper inline documentation.
I've included developer documentation if appropriate.
I've updated all React Native files affected by any refactorings/renamings in this PR (please manually search all *.native.js files for terms that need renaming or removal).

github-actions · 2021-05-11T02:42:08Z

Size Change: 0 B

Total Size: 1.31 MB

ℹ️ View Unchanged

Filename	Size	Change
`build/a11y/index.js`	1.12 kB	0 B
`build/annotations/index.js`	2.93 kB	0 B
`build/api-fetch/index.js`	2.42 kB	0 B
`build/autop/index.js`	2.28 kB	0 B
`build/blob/index.js`	673 B	0 B
`build/block-directory/index.js`	6.6 kB	0 B
`build/block-directory/style-rtl.css`	993 B	0 B
`build/block-directory/style.css`	995 B	0 B
`build/block-editor/index.js`	116 kB	0 B
`build/block-editor/style-rtl.css`	13 kB	0 B
`build/block-editor/style.css`	13 kB	0 B
`build/block-library/blocks/archives/editor-rtl.css`	61 B	0 B
`build/block-library/blocks/archives/editor.css`	60 B	0 B
`build/block-library/blocks/audio/editor-rtl.css`	58 B	0 B
`build/block-library/blocks/audio/editor.css`	58 B	0 B
`build/block-library/blocks/audio/style-rtl.css`	112 B	0 B
`build/block-library/blocks/audio/style.css`	112 B	0 B
`build/block-library/blocks/block/editor-rtl.css`	161 B	0 B
`build/block-library/blocks/block/editor.css`	161 B	0 B
`build/block-library/blocks/button/editor-rtl.css`	475 B	0 B
`build/block-library/blocks/button/editor.css`	474 B	0 B
`build/block-library/blocks/button/style-rtl.css`	515 B	0 B
`build/block-library/blocks/button/style.css`	515 B	0 B
`build/block-library/blocks/buttons/editor-rtl.css`	315 B	0 B
`build/block-library/blocks/buttons/editor.css`	315 B	0 B
`build/block-library/blocks/buttons/style-rtl.css`	368 B	0 B
`build/block-library/blocks/buttons/style.css`	368 B	0 B
`build/block-library/blocks/calendar/style-rtl.css`	208 B	0 B
`build/block-library/blocks/calendar/style.css`	208 B	0 B
`build/block-library/blocks/categories/editor-rtl.css`	84 B	0 B
`build/block-library/blocks/categories/editor.css`	83 B	0 B
`build/block-library/blocks/categories/style-rtl.css`	79 B	0 B
`build/block-library/blocks/categories/style.css`	79 B	0 B
`build/block-library/blocks/code/style-rtl.css`	90 B	0 B
`build/block-library/blocks/code/style.css`	90 B	0 B
`build/block-library/blocks/columns/editor-rtl.css`	190 B	0 B
`build/block-library/blocks/columns/editor.css`	190 B	0 B
`build/block-library/blocks/columns/style-rtl.css`	422 B	0 B
`build/block-library/blocks/columns/style.css`	422 B	0 B
`build/block-library/blocks/cover/editor-rtl.css`	643 B	0 B
`build/block-library/blocks/cover/editor.css`	645 B	0 B
`build/block-library/blocks/cover/style-rtl.css`	1.22 kB	0 B
`build/block-library/blocks/cover/style.css`	1.22 kB	0 B
`build/block-library/blocks/embed/editor-rtl.css`	486 B	0 B
`build/block-library/blocks/embed/editor.css`	486 B	0 B
`build/block-library/blocks/embed/style-rtl.css`	401 B	0 B
`build/block-library/blocks/embed/style.css`	400 B	0 B
`build/block-library/blocks/file/editor-rtl.css`	301 B	0 B
`build/block-library/blocks/file/editor.css`	300 B	0 B
`build/block-library/blocks/file/frontend.js`	773 B	0 B
`build/block-library/blocks/file/style-rtl.css`	255 B	0 B
`build/block-library/blocks/file/style.css`	255 B	0 B
`build/block-library/blocks/freeform/editor-rtl.css`	2.45 kB	0 B
`build/block-library/blocks/freeform/editor.css`	2.45 kB	0 B
`build/block-library/blocks/gallery/editor-rtl.css`	704 B	0 B
`build/block-library/blocks/gallery/editor.css`	705 B	0 B
`build/block-library/blocks/gallery/style-rtl.css`	1.06 kB	0 B
`build/block-library/blocks/gallery/style.css`	1.05 kB	0 B
`build/block-library/blocks/group/editor-rtl.css`	160 B	0 B
`build/block-library/blocks/group/editor.css`	160 B	0 B
`build/block-library/blocks/group/style-rtl.css`	57 B	0 B
`build/block-library/blocks/group/style.css`	57 B	0 B
`build/block-library/blocks/heading/editor-rtl.css`	129 B	0 B
`build/block-library/blocks/heading/editor.css`	129 B	0 B
`build/block-library/blocks/heading/style-rtl.css`	76 B	0 B
`build/block-library/blocks/heading/style.css`	76 B	0 B
`build/block-library/blocks/home-link/style-rtl.css`	259 B	0 B
`build/block-library/blocks/home-link/style.css`	259 B	0 B
`build/block-library/blocks/html/editor-rtl.css`	281 B	0 B
`build/block-library/blocks/html/editor.css`	281 B	0 B
`build/block-library/blocks/image/editor-rtl.css`	717 B	0 B
`build/block-library/blocks/image/editor.css`	716 B	0 B
`build/block-library/blocks/image/style-rtl.css`	476 B	0 B
`build/block-library/blocks/image/style.css`	478 B	0 B
`build/block-library/blocks/latest-comments/style-rtl.css`	281 B	0 B
`build/block-library/blocks/latest-comments/style.css`	282 B	0 B
`build/block-library/blocks/latest-posts/editor-rtl.css`	137 B	0 B
`build/block-library/blocks/latest-posts/editor.css`	137 B	0 B
`build/block-library/blocks/latest-posts/style-rtl.css`	523 B	0 B
`build/block-library/blocks/latest-posts/style.css`	522 B	0 B
`build/block-library/blocks/legacy-widget/editor-rtl.css`	557 B	0 B
`build/block-library/blocks/legacy-widget/editor.css`	557 B	0 B
`build/block-library/blocks/list/style-rtl.css`	63 B	0 B
`build/block-library/blocks/list/style.css`	63 B	0 B
`build/block-library/blocks/media-text/editor-rtl.css`	176 B	0 B
`build/block-library/blocks/media-text/editor.css`	176 B	0 B
`build/block-library/blocks/media-text/style-rtl.css`	492 B	0 B
`build/block-library/blocks/media-text/style.css`	489 B	0 B
`build/block-library/blocks/more/editor-rtl.css`	434 B	0 B
`build/block-library/blocks/more/editor.css`	434 B	0 B
`build/block-library/blocks/navigation-link/editor-rtl.css`	617 B	0 B
`build/block-library/blocks/navigation-link/editor.css`	619 B	0 B
`build/block-library/blocks/navigation-link/style-rtl.css`	94 B	0 B
`build/block-library/blocks/navigation-link/style.css`	94 B	0 B
`build/block-library/blocks/navigation/editor-rtl.css`	1.32 kB	0 B
`build/block-library/blocks/navigation/editor.css`	1.31 kB	0 B
`build/block-library/blocks/navigation/style-rtl.css`	1.27 kB	0 B
`build/block-library/blocks/navigation/style.css`	1.27 kB	0 B
`build/block-library/blocks/nextpage/editor-rtl.css`	395 B	0 B
`build/block-library/blocks/nextpage/editor.css`	395 B	0 B
`build/block-library/blocks/page-list/editor-rtl.css`	239 B	0 B
`build/block-library/blocks/page-list/editor.css`	240 B	0 B
`build/block-library/blocks/page-list/style-rtl.css`	167 B	0 B
`build/block-library/blocks/page-list/style.css`	167 B	0 B
`build/block-library/blocks/paragraph/editor-rtl.css`	157 B	0 B
`build/block-library/blocks/paragraph/editor.css`	157 B	0 B
`build/block-library/blocks/paragraph/style-rtl.css`	247 B	0 B
`build/block-library/blocks/paragraph/style.css`	248 B	0 B
`build/block-library/blocks/post-author/editor-rtl.css`	209 B	0 B
`build/block-library/blocks/post-author/editor.css`	209 B	0 B
`build/block-library/blocks/post-author/style-rtl.css`	183 B	0 B
`build/block-library/blocks/post-author/style.css`	184 B	0 B
`build/block-library/blocks/post-comments-form/style-rtl.css`	140 B	0 B
`build/block-library/blocks/post-comments-form/style.css`	140 B	0 B
`build/block-library/blocks/post-comments/style-rtl.css`	362 B	0 B
`build/block-library/blocks/post-comments/style.css`	362 B	0 B
`build/block-library/blocks/post-content/editor-rtl.css`	139 B	0 B
`build/block-library/blocks/post-content/editor.css`	139 B	0 B
`build/block-library/blocks/post-excerpt/editor-rtl.css`	73 B	0 B
`build/block-library/blocks/post-excerpt/editor.css`	73 B	0 B
`build/block-library/blocks/post-excerpt/style-rtl.css`	69 B	0 B
`build/block-library/blocks/post-excerpt/style.css`	69 B	0 B
`build/block-library/blocks/post-featured-image/editor-rtl.css`	338 B	0 B
`build/block-library/blocks/post-featured-image/editor.css`	338 B	0 B
`build/block-library/blocks/post-featured-image/style-rtl.css`	119 B	0 B
`build/block-library/blocks/post-featured-image/style.css`	119 B	0 B
`build/block-library/blocks/post-title/style-rtl.css`	60 B	0 B
`build/block-library/blocks/post-title/style.css`	60 B	0 B
`build/block-library/blocks/preformatted/style-rtl.css`	103 B	0 B
`build/block-library/blocks/preformatted/style.css`	103 B	0 B
`build/block-library/blocks/pullquote/editor-rtl.css`	183 B	0 B
`build/block-library/blocks/pullquote/editor.css`	183 B	0 B
`build/block-library/blocks/pullquote/style-rtl.css`	318 B	0 B
`build/block-library/blocks/pullquote/style.css`	318 B	0 B
`build/block-library/blocks/query-loop/editor-rtl.css`	83 B	0 B
`build/block-library/blocks/query-loop/editor.css`	82 B	0 B
`build/block-library/blocks/query-loop/style-rtl.css`	315 B	0 B
`build/block-library/blocks/query-loop/style.css`	317 B	0 B
`build/block-library/blocks/query-pagination-numbers/editor-rtl.css`	122 B	0 B
`build/block-library/blocks/query-pagination-numbers/editor.css`	121 B	0 B
`build/block-library/blocks/query-pagination/editor-rtl.css`	270 B	0 B
`build/block-library/blocks/query-pagination/editor.css`	262 B	0 B
`build/block-library/blocks/query-pagination/style-rtl.css`	168 B	0 B
`build/block-library/blocks/query-pagination/style.css`	168 B	0 B
`build/block-library/blocks/query-title/editor-rtl.css`	86 B	0 B
`build/block-library/blocks/query-title/editor.css`	86 B	0 B
`build/block-library/blocks/query/editor-rtl.css`	131 B	0 B
`build/block-library/blocks/query/editor.css`	132 B	0 B
`build/block-library/blocks/quote/style-rtl.css`	169 B	0 B
`build/block-library/blocks/quote/style.css`	169 B	0 B
`build/block-library/blocks/rss/editor-rtl.css`	201 B	0 B
`build/block-library/blocks/rss/editor.css`	202 B	0 B
`build/block-library/blocks/rss/style-rtl.css`	290 B	0 B
`build/block-library/blocks/rss/style.css`	290 B	0 B
`build/block-library/blocks/search/editor-rtl.css`	189 B	0 B
`build/block-library/blocks/search/editor.css`	189 B	0 B
`build/block-library/blocks/search/style-rtl.css`	359 B	0 B
`build/block-library/blocks/search/style.css`	362 B	0 B
`build/block-library/blocks/separator/editor-rtl.css`	99 B	0 B
`build/block-library/blocks/separator/editor.css`	99 B	0 B
`build/block-library/blocks/separator/style-rtl.css`	251 B	0 B
`build/block-library/blocks/separator/style.css`	251 B	0 B
`build/block-library/blocks/shortcode/editor-rtl.css`	512 B	0 B
`build/block-library/blocks/shortcode/editor.css`	512 B	0 B
`build/block-library/blocks/site-logo/editor-rtl.css`	440 B	0 B
`build/block-library/blocks/site-logo/editor.css`	441 B	0 B
`build/block-library/blocks/site-logo/style-rtl.css`	154 B	0 B
`build/block-library/blocks/site-logo/style.css`	154 B	0 B
`build/block-library/blocks/social-link/editor-rtl.css`	164 B	0 B
`build/block-library/blocks/social-link/editor.css`	165 B	0 B
`build/block-library/blocks/social-links/editor-rtl.css`	796 B	0 B
`build/block-library/blocks/social-links/editor.css`	795 B	0 B
`build/block-library/blocks/social-links/style-rtl.css`	1.32 kB	0 B
`build/block-library/blocks/social-links/style.css`	1.33 kB	0 B
`build/block-library/blocks/spacer/editor-rtl.css`	308 B	0 B
`build/block-library/blocks/spacer/editor.css`	308 B	0 B
`build/block-library/blocks/spacer/style-rtl.css`	48 B	0 B
`build/block-library/blocks/spacer/style.css`	48 B	0 B
`build/block-library/blocks/table/editor-rtl.css`	478 B	0 B
`build/block-library/blocks/table/editor.css`	478 B	0 B
`build/block-library/blocks/table/style-rtl.css`	485 B	0 B
`build/block-library/blocks/table/style.css`	485 B	0 B
`build/block-library/blocks/tag-cloud/editor-rtl.css`	118 B	0 B
`build/block-library/blocks/tag-cloud/editor.css`	118 B	0 B
`build/block-library/blocks/tag-cloud/style-rtl.css`	94 B	0 B
`build/block-library/blocks/tag-cloud/style.css`	94 B	0 B
`build/block-library/blocks/template-part/editor-rtl.css`	551 B	0 B
`build/block-library/blocks/template-part/editor.css`	550 B	0 B
`build/block-library/blocks/term-description/editor-rtl.css`	90 B	0 B
`build/block-library/blocks/term-description/editor.css`	90 B	0 B
`build/block-library/blocks/text-columns/editor-rtl.css`	95 B	0 B
`build/block-library/blocks/text-columns/editor.css`	95 B	0 B
`build/block-library/blocks/text-columns/style-rtl.css`	166 B	0 B
`build/block-library/blocks/text-columns/style.css`	166 B	0 B
`build/block-library/blocks/verse/style-rtl.css`	87 B	0 B
`build/block-library/blocks/verse/style.css`	87 B	0 B
`build/block-library/blocks/video/editor-rtl.css`	569 B	0 B
`build/block-library/blocks/video/editor.css`	570 B	0 B
`build/block-library/blocks/video/style-rtl.css`	169 B	0 B
`build/block-library/blocks/video/style.css`	169 B	0 B
`build/block-library/common-rtl.css`	1.26 kB	0 B
`build/block-library/common.css`	1.26 kB	0 B
`build/block-library/editor-rtl.css`	9.67 kB	0 B
`build/block-library/editor.css`	9.66 kB	0 B
`build/block-library/index.js`	143 kB	0 B
`build/block-library/reset-rtl.css`	506 B	0 B
`build/block-library/reset.css`	507 B	0 B
`build/block-library/style-rtl.css`	9.69 kB	0 B
`build/block-library/style.css`	9.7 kB	0 B
`build/block-library/theme-rtl.css`	692 B	0 B
`build/block-library/theme.css`	693 B	0 B
`build/block-serialization-default-parser/index.js`	1.3 kB	0 B
`build/block-serialization-spec-parser/index.js`	3.06 kB	0 B
`build/blocks/index.js`	47.1 kB	0 B
`build/components/index.js`	188 kB	0 B
`build/components/style-rtl.css`	16.2 kB	0 B
`build/components/style.css`	16.2 kB	0 B
`build/compose/index.js`	9.93 kB	0 B
`build/core-data/index.js`	12.1 kB	0 B
`build/customize-widgets/index.js`	5.99 kB	0 B
`build/customize-widgets/style-rtl.css`	698 B	0 B
`build/customize-widgets/style.css`	699 B	0 B
`build/data-controls/index.js`	829 B	0 B
`build/data/index.js`	7.22 kB	0 B
`build/date/index.js`	31.8 kB	0 B
`build/deprecated/index.js`	737 B	0 B
`build/dom-ready/index.js`	576 B	0 B
`build/dom/index.js`	4.62 kB	0 B
`build/edit-navigation/index.js`	13.5 kB	0 B
`build/edit-navigation/style-rtl.css`	2.83 kB	0 B
`build/edit-navigation/style.css`	2.83 kB	0 B
`build/edit-post/classic-rtl.css`	454 B	0 B
`build/edit-post/classic.css`	454 B	0 B
`build/edit-post/index.js`	333 kB	0 B
`build/edit-post/style-rtl.css`	6.79 kB	0 B
`build/edit-post/style.css`	6.78 kB	0 B
`build/edit-site/index.js`	26.1 kB	0 B
`build/edit-site/style-rtl.css`	4.79 kB	0 B
`build/edit-site/style.css`	4.78 kB	0 B
`build/edit-widgets/index.js`	12.6 kB	0 B
`build/edit-widgets/style-rtl.css`	3.02 kB	0 B
`build/edit-widgets/style.css`	3.03 kB	0 B
`build/editor/index.js`	60.5 kB	0 B
`build/editor/style-rtl.css`	3.95 kB	0 B
`build/editor/style.css`	3.95 kB	0 B
`build/element/index.js`	3.44 kB	0 B
`build/escape-html/index.js`	739 B	0 B
`build/format-library/index.js`	5.67 kB	0 B
`build/format-library/style-rtl.css`	637 B	0 B
`build/format-library/style.css`	639 B	0 B
`build/hooks/index.js`	1.76 kB	0 B
`build/html-entities/index.js`	628 B	0 B
`build/i18n/index.js`	3.73 kB	0 B
`build/is-shallow-equal/index.js`	710 B	0 B
`build/keyboard-shortcuts/index.js`	1.65 kB	0 B
`build/keycodes/index.js`	1.43 kB	0 B
`build/list-reusable-blocks/index.js`	2.06 kB	0 B
`build/list-reusable-blocks/style-rtl.css`	629 B	0 B
`build/list-reusable-blocks/style.css`	628 B	0 B
`build/media-utils/index.js`	3.08 kB	0 B
`build/notices/index.js`	1.07 kB	0 B
`build/nux/index.js`	2.31 kB	0 B
`build/nux/style-rtl.css`	718 B	0 B
`build/nux/style.css`	716 B	0 B
`build/plugins/index.js`	2 kB	0 B
`build/primitives/index.js`	1.03 kB	0 B
`build/priority-queue/index.js`	791 B	0 B
`build/react-i18n/index.js`	924 B	0 B
`build/redux-routine/index.js`	2.82 kB	0 B
`build/reusable-blocks/index.js`	2.56 kB	0 B
`build/reusable-blocks/style-rtl.css`	225 B	0 B
`build/reusable-blocks/style.css`	225 B	0 B
`build/rich-text/index.js`	11.8 kB	0 B
`build/server-side-render/index.js`	1.64 kB	0 B
`build/shortcode/index.js`	1.68 kB	0 B
`build/token-list/index.js`	848 B	0 B
`build/url/index.js`	1.95 kB	0 B
`build/viewport/index.js`	1.28 kB	0 B
`build/warning/index.js`	1.13 kB	0 B
`build/widgets/index.js`	1.68 kB	0 B
`build/wordcount/index.js`	1.24 kB	0 B

_{compressed-size-action}

ntsekouras · 2021-05-11T08:34:29Z

This could be a controversial one

😄 - I think this will make our life a bit easier but only in the short term as we'll be increasing technical debt.

ellatrix · 2021-05-11T10:51:56Z

What's the problem with a manual retry? It's good to have a sense of what's breaking sometimes and try to fix it?

youknowriad · 2021-05-11T10:56:46Z

Not strongly against but I believe we need a more scalable way to track unstable tests first before doing this. Right now we rely too much on pinging folks every time some thing happens which may not scale forever.

@gwwar had some good ideas on this subject.

kevin940726 · 2021-05-11T11:32:38Z

Manual retry has to re-run all the e2e tests, which could be very slow, as running them once is already slow enough.

I agree we should still try to alert if something failed so that we can try to fix it properly. But often times such cases are extremely difficult to resolve, and require a deep understanding of the domain knowledge of that specific test.

I'm open to discussions/suggestions on how we can still alert on failing tests with retrying enabled (hence it's only a draft PR for now). I'm thinking maybe we can post a comment to the commit which has intermittently failing test? We can go a step further and automatically tag the last contributor working on those tests to take a look.

gwwar · 2021-05-11T21:00:37Z

I do think we should get retries going eventually (to automatically test/mark flakyness), but I suspect we'll see some benefits from figuring out how to automate a way to see what tests are failing, and testing out some ownership options for fixing them. Eg say an easily digestible dashboard + some form of notifications (slack/gh pings).

There's some pretty low hanging fruit already by sifting through recent e2e failures on trunk. Any of these are flaky since we can assume that most contributors should be verifying that checks are green on their branch before merge:

https://github.com/WordPress/gutenberg/actions/workflows/end2end-test.yml?query=is%3Afailure+branch%3Atrunk

There was a related blog post by GitHub which was a decent read https://github.blog/2020-12-16-reducing-flaky-builds-by-18x/.

ellatrix · 2021-05-15T15:47:27Z

When an e2e test fails intermittently, it usually means the test is bad and we should fix it. There's lots of cases where we're not appropriately waiting for a selector. Often checking the screenshot artefact gives some good clues about what goes wrong and someone just needs to take the time to fix it.

ellatrix · 2021-05-15T16:24:10Z

Perfect example of a test failing when it runs too fast: Fix intermittent embeds failure.
Perfect example of a test failing when it runs too slow: Fix flaky change detection tests causing intermittent failures.

kevin940726 · 2021-05-17T02:29:12Z

@ellatrix I agree to all of these, but I don't think they're mutually exclusive. We should fix the intermittently failing tests, but we can also add retrying. The current problem is that contributors often get confused when there are failed tests in their PRs, having no idea if they caused those tests to fail. This makes them lose confidence to the checks in the PR, and maybe even ignore the failing tests.

I suggest adding some retrying to the tests, so that we can get those tests to pass in PRs, but we should also add some some kind of alert to notify the right people if any of those tests fail intermittently. The latter part is still TBD, hence the reason this is still a draft PR.

ellatrix · 2021-05-17T07:27:12Z

Sure, it seems fine when we have a log somewhere about which tests have failed how many times with artefacts, so the data is not lost. It’s sometimes also important to know when the test started failing. If we keep all this information, I’m ok with it.

draganescu · 2021-05-28T13:28:38Z

I think this idea is a good complementary help, which does not replace the need to fix flaky tests at all. It may obscure this need if we don't surface them anymore.

alert to notify the right people

I think it is better to have a central place of seeing these problems. In an ideal world once we detect a flaky test, which is flaky (which means it restarted and passed) more than X times we auto-create an issue and label it accordingly. I have no clue if this can be done, but it does not sound impossible.

Notifying people is a system that only creates more notifications.

All in all, the idea to auto-restart is solid and will remove a blocker for all contributors, increase the confidence in the failures (meaning that the computer already "tried again", so it's probably you), and be a solution to the problem at hand which is flaky-ness costing time and creating frustration.

kevin940726 · 2021-05-28T14:59:22Z

I have no clue if this can be done, but it does not sound impossible.

It should be very possible, and probably not very difficult to do. We can do that via GitHub actions, and automatically create an issue for each flaky test. Whenever it's detected, we can add a new comment about when, which commit, and the error message of the failed test.

Notifying people is a system that only creates more notifications.

The idea is to make sure the flaky test is being handled or assigned to at least one person, much like an auto-triaging system. In the GitHub post mentioned above, they recommended to only tag the person who wrote the flaky test, which doesn't seem like a bad idea IMO.

A nice-to-have bonus would be to create a visualized dashboard of all the flaky tests over time. So that we can monitor if we increase the confidence of our tests or not.

mcsf · 2021-05-28T15:14:52Z

I think this idea is a good complementary help, which does not replace the need to fix flaky tests at all. It may obscure this need if we don't surface them anymore.

alert to notify the right people

I think it is better to have a central place of seeing these problems. In an ideal world once we detect a flaky test, which is flaky (which means it restarted and passed) more than X times we auto-create an issue and label it accordingly. I have no clue if this can be done, but it does not sound impossible. […]

My worry here is that pushing flaky tests away from the spotlight of a PR's checks — whether by auto-posting a comment in some past commit, by aggregating a list somewhere else, or what have you — is going to: decrease awareness of test flakiness; decrease the perceived severity of it; and foster a bystander effect by which most contributors, novice and seasoned alike, will disregard the issue entirely, "abstracting away" the problem and leaving it up to those most involved or diligent in the core team.

I would prefer that no action be taken than to merge this PR in its current form. That said, what about the following hybrid approach? For every test that fails, we log that failure before letting Jest retry it (twice at most). If the test succeeds after retrying, it will show up as passing. However, at the end of the test suite we add a specific test whose purpose is to fail if any flakiness was logged.

That way, all parties involved in that PR need to confront the failure. But now they are in a better position to diagnose it. If it is a flaky test, they can make a conscious decision to force-merge a PR which has otherwise passing tests. As a consequence, this might put a brake on the proliferation of new flaky tests.

—	Currently	With just `jest.retryTimes`	With the hybrid approach
Effect on overall CI	Tests fail... or not.	Tests blindly pass.	Only the flakiness test fails.
Effect on debugging	Hard to spot if test is flaky or legitimate.	Blissful ignorance.	Flakiness test reports which test(s) is/are flaky.
Effect on deployment	Frustrating time waste. Admin may force merge.	Problem propagates.	Need to confront flakiness. Admin needed to force merge.
Effect on maintainers	May be involved for merging.	Left to solve problem on their own.	Involved for merging, but share burden with fellow contributors.

Thoughts?

draganescu · 2021-06-15T10:30:23Z

I do agree with @mcsf 's suggestion that, both by blindly retries and by creating specific "flaky test" issues, we indirectly create a new problem for the core maintainers. Fixing tests is not "fun", and it "only" solves a generic project wide problem. So, I can foresee these issues aging there.

On the other hand, it may be that many of these flaky test issues are also good 1st issues. Also, for example, efforts by folks like @hellofromtonya, to create a more stable and consistent testing team and testing focus, may result in these issues being picked up and solved.

I like @mcsf 's proposal because it gives the PR author the opportunity to have a clear description of what they have to fix. Sometimes this fixing will be skipped by force merging, but this action needs a justification. I worry that the PR author will, many times, be very removed in expertise from the flaky test (imagine fixing a typo in a doc and being hit with a flaky e2e from widgets). I am also afraid that we underestimate the number of requests for "force" merges, if that is what we aim for as a best practice.

I don't think either of the solutions will put a brake on the proliferation of flaky tests. These appear because the system that we use to develop tests allows for their flakiness to be invisible to the developer. They "proliferate" because perhaps there is a tension between the complexity we're testing and the simplicity of the tooling.

In conclusion, either of the "don't let it slide" directions (the automated issue creation and/or the flaky tests test) works equally towards nudging people to improve the health of the codebase, but the problem this PR tries to address is that we are wasting probably considerable time manually, blindly, annoyingly, clicking a button: the restart all jobs button. For this problem, automating retries is a good idea and it is better than nothing.

gziolo · 2021-06-15T11:06:11Z

In my opinion, we should start with identifying the tests that are failing, the ratio of the failure vs passes, classify the reasons for the failures. Once we have the full picture of the current state of the e2e tests, we can discuss further steps.

Trying to pass the same tests 3 times improves the optics for the contributors because they will see all checks green more often but in practice, it won't increase the level of confidence that the changes added in PRs won't cause regressions.

talldan · 2021-07-14T06:09:01Z

In my opinion, we should start with identifying the tests that are failing, the ratio of the failure vs passes, classify the reasons for the failures. Once we have the full picture of the current state of the e2e tests, we can discuss further steps.

This sounds like a good plan 👍

Though I think it should only be based on the results of the tests that run on commits to trunk.

PR test outcomes are often skewed by the code being a work in progress.

kevin940726 · 2021-09-01T09:36:37Z

For anyone subscribed to this issue. I opened a follow-up draft PR as a proposal in #34432. Feel free to leave your feedbacks there!

draganescu · 2021-09-13T13:12:55Z

Now that #34432 is merged this becomes more feasible. Right?

gziolo · 2021-09-13T13:21:16Z

Now that #34432 is merged this becomes more feasible. Right?

Isn't it an alternative approach and the PR can be closed now?

kevin940726 · 2021-09-13T13:47:39Z

Yep this can be closed now. This PR is included in #34432.

draganescu · 2021-09-13T15:13:19Z

🤦🏻 <- thats all I can say.

Retry flaky e2e tests at most 2 times

8deb0b9

kevin940726 added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label May 11, 2021

ellatrix requested a review from youknowriad May 11, 2021 10:52

gziolo requested a review from a team May 17, 2021 05:44

swissspidy mentioned this pull request May 17, 2021

E2E Tests: retry flaky tests GoogleForCreators/web-stories-wp#7549

Closed

spacedmonkey mentioned this pull request May 27, 2021

E2E Tests: retry flaky tests GoogleForCreators/web-stories-wp#7758

Merged

9 tasks

kevin940726 mentioned this pull request Aug 10, 2021

Ideas for improving E2E test developer experience #33532

Closed

10 tasks

talldan mentioned this pull request Aug 11, 2021

[Experiment] Rerun failing jobs #33979

Closed

gziolo mentioned this pull request Aug 11, 2021

Look at automatically retrying E2E tests #33980

Closed

kevin940726 mentioned this pull request Sep 1, 2021

Try reporting flaky tests to issues #34432

Merged

7 tasks

kevin940726 closed this Sep 13, 2021

kevin940726 deleted the update/retry-flaky-e2e-tests branch September 13, 2021 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry flaky e2e tests at most 2 times #31682

Retry flaky e2e tests at most 2 times #31682

kevin940726 commented May 11, 2021 •

edited by gziolo

Loading

github-actions bot commented May 11, 2021

ntsekouras commented May 11, 2021

ellatrix commented May 11, 2021

youknowriad commented May 11, 2021

kevin940726 commented May 11, 2021

gwwar commented May 11, 2021

ellatrix commented May 15, 2021 •

edited

Loading

ellatrix commented May 15, 2021

kevin940726 commented May 17, 2021

ellatrix commented May 17, 2021

draganescu commented May 28, 2021

kevin940726 commented May 28, 2021

mcsf commented May 28, 2021

draganescu commented Jun 15, 2021 •

edited

Loading

gziolo commented Jun 15, 2021

talldan commented Jul 14, 2021

kevin940726 commented Sep 1, 2021

draganescu commented Sep 13, 2021

gziolo commented Sep 13, 2021

kevin940726 commented Sep 13, 2021

draganescu commented Sep 13, 2021

Retry flaky e2e tests at most 2 times #31682

Retry flaky e2e tests at most 2 times #31682

Conversation

kevin940726 commented May 11, 2021 • edited by gziolo Loading

Description

How has this been tested?

Types of changes

Checklist:

github-actions bot commented May 11, 2021

ntsekouras commented May 11, 2021

ellatrix commented May 11, 2021

youknowriad commented May 11, 2021

kevin940726 commented May 11, 2021

gwwar commented May 11, 2021

ellatrix commented May 15, 2021 • edited Loading

ellatrix commented May 15, 2021

kevin940726 commented May 17, 2021

ellatrix commented May 17, 2021

draganescu commented May 28, 2021

kevin940726 commented May 28, 2021

mcsf commented May 28, 2021

draganescu commented Jun 15, 2021 • edited Loading

gziolo commented Jun 15, 2021

talldan commented Jul 14, 2021

kevin940726 commented Sep 1, 2021

draganescu commented Sep 13, 2021

gziolo commented Sep 13, 2021

kevin940726 commented Sep 13, 2021

draganescu commented Sep 13, 2021

kevin940726 commented May 11, 2021 •

edited by gziolo

Loading

ellatrix commented May 15, 2021 •

edited

Loading

draganescu commented Jun 15, 2021 •

edited

Loading