Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry flaky e2e tests at most 2 times #31682

Closed
wants to merge 1 commit into from

Conversation

kevin940726
Copy link
Member

@kevin940726 kevin940726 commented May 11, 2021

Description

Related to #33980.

It retries failed tests at most 2 times (3 times counting the initial run) for e2e tests. This is only possible after we migrated to jest-circus.

This could be a controversial one, but IMO it does unblock us from some flaky tests immediately, and makes our lives a bit easier to maintain the tests.

If the test failed 3 times in a row, then we should definitely fix it. We should probably still add a mark if a test failed at least one time though, so that we can take a look at them early.

This is only enabled in CI environment (GitHub actions), so that in local testing we can still be alerted if something failed as soon as possible.

How has this been tested?

Intentionally update an e2e test to fail intermittently and observe that it runs at most 3 times until it passes.

Types of changes

New feature

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • I've tested my changes with keyboard and screen readers.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.
  • I've updated all React Native files affected by any refactorings/renamings in this PR (please manually search all *.native.js files for terms that need renaming or removal).

@kevin940726 kevin940726 added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label May 11, 2021
@github-actions
Copy link

Size Change: 0 B

Total Size: 1.31 MB

ℹ️ View Unchanged
Filename Size Change
build/a11y/index.js 1.12 kB 0 B
build/annotations/index.js 2.93 kB 0 B
build/api-fetch/index.js 2.42 kB 0 B
build/autop/index.js 2.28 kB 0 B
build/blob/index.js 673 B 0 B
build/block-directory/index.js 6.6 kB 0 B
build/block-directory/style-rtl.css 993 B 0 B
build/block-directory/style.css 995 B 0 B
build/block-editor/index.js 116 kB 0 B
build/block-editor/style-rtl.css 13 kB 0 B
build/block-editor/style.css 13 kB 0 B
build/block-library/blocks/archives/editor-rtl.css 61 B 0 B
build/block-library/blocks/archives/editor.css 60 B 0 B
build/block-library/blocks/audio/editor-rtl.css 58 B 0 B
build/block-library/blocks/audio/editor.css 58 B 0 B
build/block-library/blocks/audio/style-rtl.css 112 B 0 B
build/block-library/blocks/audio/style.css 112 B 0 B
build/block-library/blocks/block/editor-rtl.css 161 B 0 B
build/block-library/blocks/block/editor.css 161 B 0 B
build/block-library/blocks/button/editor-rtl.css 475 B 0 B
build/block-library/blocks/button/editor.css 474 B 0 B
build/block-library/blocks/button/style-rtl.css 515 B 0 B
build/block-library/blocks/button/style.css 515 B 0 B
build/block-library/blocks/buttons/editor-rtl.css 315 B 0 B
build/block-library/blocks/buttons/editor.css 315 B 0 B
build/block-library/blocks/buttons/style-rtl.css 368 B 0 B
build/block-library/blocks/buttons/style.css 368 B 0 B
build/block-library/blocks/calendar/style-rtl.css 208 B 0 B
build/block-library/blocks/calendar/style.css 208 B 0 B
build/block-library/blocks/categories/editor-rtl.css 84 B 0 B
build/block-library/blocks/categories/editor.css 83 B 0 B
build/block-library/blocks/categories/style-rtl.css 79 B 0 B
build/block-library/blocks/categories/style.css 79 B 0 B
build/block-library/blocks/code/style-rtl.css 90 B 0 B
build/block-library/blocks/code/style.css 90 B 0 B
build/block-library/blocks/columns/editor-rtl.css 190 B 0 B
build/block-library/blocks/columns/editor.css 190 B 0 B
build/block-library/blocks/columns/style-rtl.css 422 B 0 B
build/block-library/blocks/columns/style.css 422 B 0 B
build/block-library/blocks/cover/editor-rtl.css 643 B 0 B
build/block-library/blocks/cover/editor.css 645 B 0 B
build/block-library/blocks/cover/style-rtl.css 1.22 kB 0 B
build/block-library/blocks/cover/style.css 1.22 kB 0 B
build/block-library/blocks/embed/editor-rtl.css 486 B 0 B
build/block-library/blocks/embed/editor.css 486 B 0 B
build/block-library/blocks/embed/style-rtl.css 401 B 0 B
build/block-library/blocks/embed/style.css 400 B 0 B
build/block-library/blocks/file/editor-rtl.css 301 B 0 B
build/block-library/blocks/file/editor.css 300 B 0 B
build/block-library/blocks/file/frontend.js 773 B 0 B
build/block-library/blocks/file/style-rtl.css 255 B 0 B
build/block-library/blocks/file/style.css 255 B 0 B
build/block-library/blocks/freeform/editor-rtl.css 2.45 kB 0 B
build/block-library/blocks/freeform/editor.css 2.45 kB 0 B
build/block-library/blocks/gallery/editor-rtl.css 704 B 0 B
build/block-library/blocks/gallery/editor.css 705 B 0 B
build/block-library/blocks/gallery/style-rtl.css 1.06 kB 0 B
build/block-library/blocks/gallery/style.css 1.05 kB 0 B
build/block-library/blocks/group/editor-rtl.css 160 B 0 B
build/block-library/blocks/group/editor.css 160 B 0 B
build/block-library/blocks/group/style-rtl.css 57 B 0 B
build/block-library/blocks/group/style.css 57 B 0 B
build/block-library/blocks/heading/editor-rtl.css 129 B 0 B
build/block-library/blocks/heading/editor.css 129 B 0 B
build/block-library/blocks/heading/style-rtl.css 76 B 0 B
build/block-library/blocks/heading/style.css 76 B 0 B
build/block-library/blocks/home-link/style-rtl.css 259 B 0 B
build/block-library/blocks/home-link/style.css 259 B 0 B
build/block-library/blocks/html/editor-rtl.css 281 B 0 B
build/block-library/blocks/html/editor.css 281 B 0 B
build/block-library/blocks/image/editor-rtl.css 717 B 0 B
build/block-library/blocks/image/editor.css 716 B 0 B
build/block-library/blocks/image/style-rtl.css 476 B 0 B
build/block-library/blocks/image/style.css 478 B 0 B
build/block-library/blocks/latest-comments/style-rtl.css 281 B 0 B
build/block-library/blocks/latest-comments/style.css 282 B 0 B
build/block-library/blocks/latest-posts/editor-rtl.css 137 B 0 B
build/block-library/blocks/latest-posts/editor.css 137 B 0 B
build/block-library/blocks/latest-posts/style-rtl.css 523 B 0 B
build/block-library/blocks/latest-posts/style.css 522 B 0 B
build/block-library/blocks/legacy-widget/editor-rtl.css 557 B 0 B
build/block-library/blocks/legacy-widget/editor.css 557 B 0 B
build/block-library/blocks/list/style-rtl.css 63 B 0 B
build/block-library/blocks/list/style.css 63 B 0 B
build/block-library/blocks/media-text/editor-rtl.css 176 B 0 B
build/block-library/blocks/media-text/editor.css 176 B 0 B
build/block-library/blocks/media-text/style-rtl.css 492 B 0 B
build/block-library/blocks/media-text/style.css 489 B 0 B
build/block-library/blocks/more/editor-rtl.css 434 B 0 B
build/block-library/blocks/more/editor.css 434 B 0 B
build/block-library/blocks/navigation-link/editor-rtl.css 617 B 0 B
build/block-library/blocks/navigation-link/editor.css 619 B 0 B
build/block-library/blocks/navigation-link/style-rtl.css 94 B 0 B
build/block-library/blocks/navigation-link/style.css 94 B 0 B
build/block-library/blocks/navigation/editor-rtl.css 1.32 kB 0 B
build/block-library/blocks/navigation/editor.css 1.31 kB 0 B
build/block-library/blocks/navigation/style-rtl.css 1.27 kB 0 B
build/block-library/blocks/navigation/style.css 1.27 kB 0 B
build/block-library/blocks/nextpage/editor-rtl.css 395 B 0 B
build/block-library/blocks/nextpage/editor.css 395 B 0 B
build/block-library/blocks/page-list/editor-rtl.css 239 B 0 B
build/block-library/blocks/page-list/editor.css 240 B 0 B
build/block-library/blocks/page-list/style-rtl.css 167 B 0 B
build/block-library/blocks/page-list/style.css 167 B 0 B
build/block-library/blocks/paragraph/editor-rtl.css 157 B 0 B
build/block-library/blocks/paragraph/editor.css 157 B 0 B
build/block-library/blocks/paragraph/style-rtl.css 247 B 0 B
build/block-library/blocks/paragraph/style.css 248 B 0 B
build/block-library/blocks/post-author/editor-rtl.css 209 B 0 B
build/block-library/blocks/post-author/editor.css 209 B 0 B
build/block-library/blocks/post-author/style-rtl.css 183 B 0 B
build/block-library/blocks/post-author/style.css 184 B 0 B
build/block-library/blocks/post-comments-form/style-rtl.css 140 B 0 B
build/block-library/blocks/post-comments-form/style.css 140 B 0 B
build/block-library/blocks/post-comments/style-rtl.css 362 B 0 B
build/block-library/blocks/post-comments/style.css 362 B 0 B
build/block-library/blocks/post-content/editor-rtl.css 139 B 0 B
build/block-library/blocks/post-content/editor.css 139 B 0 B
build/block-library/blocks/post-excerpt/editor-rtl.css 73 B 0 B
build/block-library/blocks/post-excerpt/editor.css 73 B 0 B
build/block-library/blocks/post-excerpt/style-rtl.css 69 B 0 B
build/block-library/blocks/post-excerpt/style.css 69 B 0 B
build/block-library/blocks/post-featured-image/editor-rtl.css 338 B 0 B
build/block-library/blocks/post-featured-image/editor.css 338 B 0 B
build/block-library/blocks/post-featured-image/style-rtl.css 119 B 0 B
build/block-library/blocks/post-featured-image/style.css 119 B 0 B
build/block-library/blocks/post-title/style-rtl.css 60 B 0 B
build/block-library/blocks/post-title/style.css 60 B 0 B
build/block-library/blocks/preformatted/style-rtl.css 103 B 0 B
build/block-library/blocks/preformatted/style.css 103 B 0 B
build/block-library/blocks/pullquote/editor-rtl.css 183 B 0 B
build/block-library/blocks/pullquote/editor.css 183 B 0 B
build/block-library/blocks/pullquote/style-rtl.css 318 B 0 B
build/block-library/blocks/pullquote/style.css 318 B 0 B
build/block-library/blocks/query-loop/editor-rtl.css 83 B 0 B
build/block-library/blocks/query-loop/editor.css 82 B 0 B
build/block-library/blocks/query-loop/style-rtl.css 315 B 0 B
build/block-library/blocks/query-loop/style.css 317 B 0 B
build/block-library/blocks/query-pagination-numbers/editor-rtl.css 122 B 0 B
build/block-library/blocks/query-pagination-numbers/editor.css 121 B 0 B
build/block-library/blocks/query-pagination/editor-rtl.css 270 B 0 B
build/block-library/blocks/query-pagination/editor.css 262 B 0 B
build/block-library/blocks/query-pagination/style-rtl.css 168 B 0 B
build/block-library/blocks/query-pagination/style.css 168 B 0 B
build/block-library/blocks/query-title/editor-rtl.css 86 B 0 B
build/block-library/blocks/query-title/editor.css 86 B 0 B
build/block-library/blocks/query/editor-rtl.css 131 B 0 B
build/block-library/blocks/query/editor.css 132 B 0 B
build/block-library/blocks/quote/style-rtl.css 169 B 0 B
build/block-library/blocks/quote/style.css 169 B 0 B
build/block-library/blocks/rss/editor-rtl.css 201 B 0 B
build/block-library/blocks/rss/editor.css 202 B 0 B
build/block-library/blocks/rss/style-rtl.css 290 B 0 B
build/block-library/blocks/rss/style.css 290 B 0 B
build/block-library/blocks/search/editor-rtl.css 189 B 0 B
build/block-library/blocks/search/editor.css 189 B 0 B
build/block-library/blocks/search/style-rtl.css 359 B 0 B
build/block-library/blocks/search/style.css 362 B 0 B
build/block-library/blocks/separator/editor-rtl.css 99 B 0 B
build/block-library/blocks/separator/editor.css 99 B 0 B
build/block-library/blocks/separator/style-rtl.css 251 B 0 B
build/block-library/blocks/separator/style.css 251 B 0 B
build/block-library/blocks/shortcode/editor-rtl.css 512 B 0 B
build/block-library/blocks/shortcode/editor.css 512 B 0 B
build/block-library/blocks/site-logo/editor-rtl.css 440 B 0 B
build/block-library/blocks/site-logo/editor.css 441 B 0 B
build/block-library/blocks/site-logo/style-rtl.css 154 B 0 B
build/block-library/blocks/site-logo/style.css 154 B 0 B
build/block-library/blocks/social-link/editor-rtl.css 164 B 0 B
build/block-library/blocks/social-link/editor.css 165 B 0 B
build/block-library/blocks/social-links/editor-rtl.css 796 B 0 B
build/block-library/blocks/social-links/editor.css 795 B 0 B
build/block-library/blocks/social-links/style-rtl.css 1.32 kB 0 B
build/block-library/blocks/social-links/style.css 1.33 kB 0 B
build/block-library/blocks/spacer/editor-rtl.css 308 B 0 B
build/block-library/blocks/spacer/editor.css 308 B 0 B
build/block-library/blocks/spacer/style-rtl.css 48 B 0 B
build/block-library/blocks/spacer/style.css 48 B 0 B
build/block-library/blocks/table/editor-rtl.css 478 B 0 B
build/block-library/blocks/table/editor.css 478 B 0 B
build/block-library/blocks/table/style-rtl.css 485 B 0 B
build/block-library/blocks/table/style.css 485 B 0 B
build/block-library/blocks/tag-cloud/editor-rtl.css 118 B 0 B
build/block-library/blocks/tag-cloud/editor.css 118 B 0 B
build/block-library/blocks/tag-cloud/style-rtl.css 94 B 0 B
build/block-library/blocks/tag-cloud/style.css 94 B 0 B
build/block-library/blocks/template-part/editor-rtl.css 551 B 0 B
build/block-library/blocks/template-part/editor.css 550 B 0 B
build/block-library/blocks/term-description/editor-rtl.css 90 B 0 B
build/block-library/blocks/term-description/editor.css 90 B 0 B
build/block-library/blocks/text-columns/editor-rtl.css 95 B 0 B
build/block-library/blocks/text-columns/editor.css 95 B 0 B
build/block-library/blocks/text-columns/style-rtl.css 166 B 0 B
build/block-library/blocks/text-columns/style.css 166 B 0 B
build/block-library/blocks/verse/style-rtl.css 87 B 0 B
build/block-library/blocks/verse/style.css 87 B 0 B
build/block-library/blocks/video/editor-rtl.css 569 B 0 B
build/block-library/blocks/video/editor.css 570 B 0 B
build/block-library/blocks/video/style-rtl.css 169 B 0 B
build/block-library/blocks/video/style.css 169 B 0 B
build/block-library/common-rtl.css 1.26 kB 0 B
build/block-library/common.css 1.26 kB 0 B
build/block-library/editor-rtl.css 9.67 kB 0 B
build/block-library/editor.css 9.66 kB 0 B
build/block-library/index.js 143 kB 0 B
build/block-library/reset-rtl.css 506 B 0 B
build/block-library/reset.css 507 B 0 B
build/block-library/style-rtl.css 9.69 kB 0 B
build/block-library/style.css 9.7 kB 0 B
build/block-library/theme-rtl.css 692 B 0 B
build/block-library/theme.css 693 B 0 B
build/block-serialization-default-parser/index.js 1.3 kB 0 B
build/block-serialization-spec-parser/index.js 3.06 kB 0 B
build/blocks/index.js 47.1 kB 0 B
build/components/index.js 188 kB 0 B
build/components/style-rtl.css 16.2 kB 0 B
build/components/style.css 16.2 kB 0 B
build/compose/index.js 9.93 kB 0 B
build/core-data/index.js 12.1 kB 0 B
build/customize-widgets/index.js 5.99 kB 0 B
build/customize-widgets/style-rtl.css 698 B 0 B
build/customize-widgets/style.css 699 B 0 B
build/data-controls/index.js 829 B 0 B
build/data/index.js 7.22 kB 0 B
build/date/index.js 31.8 kB 0 B
build/deprecated/index.js 737 B 0 B
build/dom-ready/index.js 576 B 0 B
build/dom/index.js 4.62 kB 0 B
build/edit-navigation/index.js 13.5 kB 0 B
build/edit-navigation/style-rtl.css 2.83 kB 0 B
build/edit-navigation/style.css 2.83 kB 0 B
build/edit-post/classic-rtl.css 454 B 0 B
build/edit-post/classic.css 454 B 0 B
build/edit-post/index.js 333 kB 0 B
build/edit-post/style-rtl.css 6.79 kB 0 B
build/edit-post/style.css 6.78 kB 0 B
build/edit-site/index.js 26.1 kB 0 B
build/edit-site/style-rtl.css 4.79 kB 0 B
build/edit-site/style.css 4.78 kB 0 B
build/edit-widgets/index.js 12.6 kB 0 B
build/edit-widgets/style-rtl.css 3.02 kB 0 B
build/edit-widgets/style.css 3.03 kB 0 B
build/editor/index.js 60.5 kB 0 B
build/editor/style-rtl.css 3.95 kB 0 B
build/editor/style.css 3.95 kB 0 B
build/element/index.js 3.44 kB 0 B
build/escape-html/index.js 739 B 0 B
build/format-library/index.js 5.67 kB 0 B
build/format-library/style-rtl.css 637 B 0 B
build/format-library/style.css 639 B 0 B
build/hooks/index.js 1.76 kB 0 B
build/html-entities/index.js 628 B 0 B
build/i18n/index.js 3.73 kB 0 B
build/is-shallow-equal/index.js 710 B 0 B
build/keyboard-shortcuts/index.js 1.65 kB 0 B
build/keycodes/index.js 1.43 kB 0 B
build/list-reusable-blocks/index.js 2.06 kB 0 B
build/list-reusable-blocks/style-rtl.css 629 B 0 B
build/list-reusable-blocks/style.css 628 B 0 B
build/media-utils/index.js 3.08 kB 0 B
build/notices/index.js 1.07 kB 0 B
build/nux/index.js 2.31 kB 0 B
build/nux/style-rtl.css 718 B 0 B
build/nux/style.css 716 B 0 B
build/plugins/index.js 2 kB 0 B
build/primitives/index.js 1.03 kB 0 B
build/priority-queue/index.js 791 B 0 B
build/react-i18n/index.js 924 B 0 B
build/redux-routine/index.js 2.82 kB 0 B
build/reusable-blocks/index.js 2.56 kB 0 B
build/reusable-blocks/style-rtl.css 225 B 0 B
build/reusable-blocks/style.css 225 B 0 B
build/rich-text/index.js 11.8 kB 0 B
build/server-side-render/index.js 1.64 kB 0 B
build/shortcode/index.js 1.68 kB 0 B
build/token-list/index.js 848 B 0 B
build/url/index.js 1.95 kB 0 B
build/viewport/index.js 1.28 kB 0 B
build/warning/index.js 1.13 kB 0 B
build/widgets/index.js 1.68 kB 0 B
build/wordcount/index.js 1.24 kB 0 B

compressed-size-action

@ntsekouras
Copy link
Contributor

This could be a controversial one

😄 - I think this will make our life a bit easier but only in the short term as we'll be increasing technical debt.

@ellatrix
Copy link
Member

What's the problem with a manual retry? It's good to have a sense of what's breaking sometimes and try to fix it?

@ellatrix ellatrix requested a review from youknowriad May 11, 2021 10:52
@youknowriad
Copy link
Contributor

Not strongly against but I believe we need a more scalable way to track unstable tests first before doing this. Right now we rely too much on pinging folks every time some thing happens which may not scale forever.

@gwwar had some good ideas on this subject.

@kevin940726
Copy link
Member Author

Manual retry has to re-run all the e2e tests, which could be very slow, as running them once is already slow enough.

I agree we should still try to alert if something failed so that we can try to fix it properly. But often times such cases are extremely difficult to resolve, and require a deep understanding of the domain knowledge of that specific test.

I'm open to discussions/suggestions on how we can still alert on failing tests with retrying enabled (hence it's only a draft PR for now). I'm thinking maybe we can post a comment to the commit which has intermittently failing test? We can go a step further and automatically tag the last contributor working on those tests to take a look.

@gwwar
Copy link
Contributor

gwwar commented May 11, 2021

I do think we should get retries going eventually (to automatically test/mark flakyness), but I suspect we'll see some benefits from figuring out how to automate a way to see what tests are failing, and testing out some ownership options for fixing them. Eg say an easily digestible dashboard + some form of notifications (slack/gh pings).

There's some pretty low hanging fruit already by sifting through recent e2e failures on trunk. Any of these are flaky since we can assume that most contributors should be verifying that checks are green on their branch before merge:

https://github.com/WordPress/gutenberg/actions/workflows/end2end-test.yml?query=is%3Afailure+branch%3Atrunk

Screen Shot 2021-05-11 at 1 57 02 PM

There was a related blog post by GitHub which was a decent read https://github.blog/2020-12-16-reducing-flaky-builds-by-18x/.

@ellatrix
Copy link
Member

ellatrix commented May 15, 2021

When an e2e test fails intermittently, it usually means the test is bad and we should fix it. There's lots of cases where we're not appropriately waiting for a selector. Often checking the screenshot artefact gives some good clues about what goes wrong and someone just needs to take the time to fix it.

@ellatrix
Copy link
Member

Perfect example of a test failing when it runs too fast: Fix intermittent embeds failure.
Perfect example of a test failing when it runs too slow: Fix flaky change detection tests causing intermittent failures.

@kevin940726
Copy link
Member Author

@ellatrix I agree to all of these, but I don't think they're mutually exclusive. We should fix the intermittently failing tests, but we can also add retrying. The current problem is that contributors often get confused when there are failed tests in their PRs, having no idea if they caused those tests to fail. This makes them lose confidence to the checks in the PR, and maybe even ignore the failing tests.

I suggest adding some retrying to the tests, so that we can get those tests to pass in PRs, but we should also add some some kind of alert to notify the right people if any of those tests fail intermittently. The latter part is still TBD, hence the reason this is still a draft PR.

@gziolo gziolo requested a review from a team May 17, 2021 05:44
@ellatrix
Copy link
Member

Sure, it seems fine when we have a log somewhere about which tests have failed how many times with artefacts, so the data is not lost. It’s sometimes also important to know when the test started failing. If we keep all this information, I’m ok with it.

@draganescu
Copy link
Contributor

I think this idea is a good complementary help, which does not replace the need to fix flaky tests at all. It may obscure this need if we don't surface them anymore.

alert to notify the right people

I think it is better to have a central place of seeing these problems. In an ideal world once we detect a flaky test, which is flaky (which means it restarted and passed) more than X times we auto-create an issue and label it accordingly. I have no clue if this can be done, but it does not sound impossible.

Notifying people is a system that only creates more notifications.

All in all, the idea to auto-restart is solid and will remove a blocker for all contributors, increase the confidence in the failures (meaning that the computer already "tried again", so it's probably you), and be a solution to the problem at hand which is flaky-ness costing time and creating frustration.

@kevin940726
Copy link
Member Author

I have no clue if this can be done, but it does not sound impossible.

It should be very possible, and probably not very difficult to do. We can do that via GitHub actions, and automatically create an issue for each flaky test. Whenever it's detected, we can add a new comment about when, which commit, and the error message of the failed test.

Notifying people is a system that only creates more notifications.

The idea is to make sure the flaky test is being handled or assigned to at least one person, much like an auto-triaging system. In the GitHub post mentioned above, they recommended to only tag the person who wrote the flaky test, which doesn't seem like a bad idea IMO.

A nice-to-have bonus would be to create a visualized dashboard of all the flaky tests over time. So that we can monitor if we increase the confidence of our tests or not.

@mcsf
Copy link
Contributor

mcsf commented May 28, 2021

I think this idea is a good complementary help, which does not replace the need to fix flaky tests at all. It may obscure this need if we don't surface them anymore.

alert to notify the right people

I think it is better to have a central place of seeing these problems. In an ideal world once we detect a flaky test, which is flaky (which means it restarted and passed) more than X times we auto-create an issue and label it accordingly. I have no clue if this can be done, but it does not sound impossible. […]

My worry here is that pushing flaky tests away from the spotlight of a PR's checks — whether by auto-posting a comment in some past commit, by aggregating a list somewhere else, or what have you — is going to: decrease awareness of test flakiness; decrease the perceived severity of it; and foster a bystander effect by which most contributors, novice and seasoned alike, will disregard the issue entirely, "abstracting away" the problem and leaving it up to those most involved or diligent in the core team.

I would prefer that no action be taken than to merge this PR in its current form. That said, what about the following hybrid approach? For every test that fails, we log that failure before letting Jest retry it (twice at most). If the test succeeds after retrying, it will show up as passing. However, at the end of the test suite we add a specific test whose purpose is to fail if any flakiness was logged.

That way, all parties involved in that PR need to confront the failure. But now they are in a better position to diagnose it. If it is a flaky test, they can make a conscious decision to force-merge a PR which has otherwise passing tests. As a consequence, this might put a brake on the proliferation of new flaky tests.

Currently With just jest.retryTimes With the hybrid approach
Effect on overall CI Tests fail... or not. Tests blindly pass. Only the flakiness test fails.
Effect on debugging Hard to spot if test is flaky or legitimate. Blissful ignorance. Flakiness test reports which test(s) is/are flaky.
Effect on deployment Frustrating time waste. Admin may force merge. Problem propagates. Need to confront flakiness. Admin needed to force merge.
Effect on maintainers May be involved for merging. Left to solve problem on their own. Involved for merging, but share burden with fellow contributors.

Thoughts?

@draganescu
Copy link
Contributor

draganescu commented Jun 15, 2021

I do agree with @mcsf 's suggestion that, both by blindly retries and by creating specific "flaky test" issues, we indirectly create a new problem for the core maintainers. Fixing tests is not "fun", and it "only" solves a generic project wide problem. So, I can foresee these issues aging there.

On the other hand, it may be that many of these flaky test issues are also good 1st issues. Also, for example, efforts by folks like @hellofromtonya, to create a more stable and consistent testing team and testing focus, may result in these issues being picked up and solved.

I like @mcsf 's proposal because it gives the PR author the opportunity to have a clear description of what they have to fix. Sometimes this fixing will be skipped by force merging, but this action needs a justification. I worry that the PR author will, many times, be very removed in expertise from the flaky test (imagine fixing a typo in a doc and being hit with a flaky e2e from widgets). I am also afraid that we underestimate the number of requests for "force" merges, if that is what we aim for as a best practice.

I don't think either of the solutions will put a brake on the proliferation of flaky tests. These appear because the system that we use to develop tests allows for their flakiness to be invisible to the developer. They "proliferate" because perhaps there is a tension between the complexity we're testing and the simplicity of the tooling.

In conclusion, either of the "don't let it slide" directions (the automated issue creation and/or the flaky tests test) works equally towards nudging people to improve the health of the codebase, but the problem this PR tries to address is that we are wasting probably considerable time manually, blindly, annoyingly, clicking a button: the restart all jobs button. For this problem, automating retries is a good idea and it is better than nothing.

@gziolo
Copy link
Member

gziolo commented Jun 15, 2021

In my opinion, we should start with identifying the tests that are failing, the ratio of the failure vs passes, classify the reasons for the failures. Once we have the full picture of the current state of the e2e tests, we can discuss further steps.

Trying to pass the same tests 3 times improves the optics for the contributors because they will see all checks green more often but in practice, it won't increase the level of confidence that the changes added in PRs won't cause regressions.

@talldan
Copy link
Contributor

talldan commented Jul 14, 2021

In my opinion, we should start with identifying the tests that are failing, the ratio of the failure vs passes, classify the reasons for the failures. Once we have the full picture of the current state of the e2e tests, we can discuss further steps.

This sounds like a good plan 👍

Though I think it should only be based on the results of the tests that run on commits to trunk.

PR test outcomes are often skewed by the code being a work in progress.

@kevin940726
Copy link
Member Author

For anyone subscribed to this issue. I opened a follow-up draft PR as a proposal in #34432. Feel free to leave your feedbacks there!

@draganescu
Copy link
Contributor

Now that #34432 is merged this becomes more feasible. Right?

@gziolo
Copy link
Member

gziolo commented Sep 13, 2021

Now that #34432 is merged this becomes more feasible. Right?

Isn't it an alternative approach and the PR can be closed now?

@kevin940726
Copy link
Member Author

Yep this can be closed now. This PR is included in #34432.

@kevin940726 kevin940726 deleted the update/retry-flaky-e2e-tests branch September 13, 2021 13:49
@draganescu
Copy link
Contributor

🤦🏻 <- thats all I can say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants