Ideas for improving E2E test developer experience #33532

noisysocks · 2021-07-19T00:17:50Z

gziolo · 2021-08-02T07:01:27Z

Fix Core version in .wp-env.json to a known git commit which is updated automatically via a PR every week. This might make it less disruptive (i.e. doesn't block every single developer) when a Core change breaks Gutenberg CI.

Yes, it's super annoying, but it doesn't happen that often. It all depends on what we consider a priority here. The current setup enforces that the conflicts in the Gutenberg plugin are fixed as soon as possible. Some of those issues are obvious mistakes like duplicated function/class definition, others are related to changes in the default theme. There is definitely room for improvement in the process to find a better balance between the developer experience and ensuring that the Gutenberg plugin is compatible with WP core's trunk.

Look at splitting the 4 npm run test-e2e actions into 6 actions. This might speed up E2E test runs on a PR.

By looking at a random PR:

Env setup took ~4-5 minutes.
Test execution took ~11-13 minutes.

When we split into more jobs we get a major improvement for the second part of the CI job. A quick estimate would be ~11-13 minutes reduced to hopefully ~7-8 minutes on a single node. The drawback would be that we not only use 2 more nodes but also increase the total execution time by ~8-10 minutes to perform additional env setup (for nodes 5 and 6). It would be essential to figure out if there is a way to share at least the build part between more nodes so we could scale to as many nodes as we need.

I also think @desrosj did some testing with splitting e2e tests into more nodes. It would be great to see what he learned.

kevin940726 · 2021-08-02T07:07:53Z

It would be essential to figure out if there is a way to share at least the build part between more nodes so we could scale to as many nodes as we need.

I tried it once in my fork, and to my surprise, sometimes building the app from scratch is actually faster than downloading and unpacking them from GH artifacts. I'm still confident that there must be some way to reduce the execution time though, just have to do some more experiments.

mtias · 2021-08-05T17:39:18Z

I think this needs to be scoped down a little bit.

Not clear what the value of a dashboard would be compared to its overhead: where would it run? What would we code it with? How would we maintain and update it? Who would monitor it?
Screencasts: also not sure what value they provide over the setup cost. It doesn't seem like something crucial at this point or worth investing too much time into.

gziolo · 2021-08-06T11:06:20Z

Screencasts: also not sure what value they provide over the setup cost. It doesn't seem like something crucial at this point or worth investing too much time into.

The current implementation proposes in #33506 slows down test execution by a few minutes per CI node, so your comment is valid. It could be useful for debugging, but overall it should be rather disabled if there is a performance penalty involved.

kevin940726 · 2021-08-07T07:53:20Z

I don't think a couple of minutes of slowdown in tests is that much of a problem though. E2E tests are already very slow, slowing down each by a few seconds shouldn't outweigh the benefits of debugging ability it brings. I've already encountered several tests which are only failing in CI and hard to debug/reproduce locally. Often times the only option we could make is to skip the tests and risk regressing the bug in future PRs. Furthermore, you can think of the slowdown as an emulation of CPU throttling, so that we can build more resilient tests. I've already found several flaky tests in that PR because of the slowdown. They are just hidden in plain sight, waiting to surface in some random PRs. In conclusion, I think the trade-offs are worth it. (In addition, it's currently implemented so that it's only enabled in CI by default, so no performance penalty during development)

A dashboard can also help here. IMO flaky tests which are only rarely failing don't worth the time to fix it ASAP. We need a way to determine and prioritize each flaky test by its failing rate. I'm aware of the complexity of a dashboard could bring, hence I opened a separate issue #33809 to track it.

vcanales · 2021-08-10T17:09:25Z

Look at automatically retrying E2E tests. This might help with stability.

Regarding this, I'm opening #33979 in order to experiment with re-running failed jobs instead of the entire workflow. I might look into automatically retrying if this works out; otherwise, my thought is that retrying full workflows would add way too much time to be worth it.

kevin940726 · 2021-08-10T17:35:22Z

@vcanales There's already #31682, which works, but the consensus seems to be that we want to have a dashboard first to record all the failing tests.

noisysocks · 2021-08-11T06:50:59Z

Yes, it's super annoying, but it doesn't happen that often. It all depends on what we consider a priority here. The current setup enforces that the conflicts in the Gutenberg plugin are fixed as soon as possible.

Agree that we need to fix conflicts as soon as possible and keep Gutenberg tested against the latest WordPress trunk. But I don't think conflicts should block all developers (there are a lot of us now! 😀) from working and I really don't think we should have to deal with conflicts at very stressful times e.g. plugin release day. Being a Gutenberg developer should be fun and chill.

It would be essential to figure out if there is a way to share at least the build part between more nodes so we could scale to as many nodes as we need.

100%. Ideally the parallelised jobs that run E2E tests should happen after a single non-parallelised setup job. Maybe this won't improve total performance all that much but it would definitely improve re-run performance which I think is a big deal as many developers spend a lot of time waiting for failed tests to re-run.

Not clear what the value of a dashboard would be compared to its overhead: where would it run? What would we code it with? How would we maintain and update it? Who would monitor it?

I don't think we can systematically address flakey tests unless we measure what we want to improve. That's the value of a dashboard. I am thinking that it could be a GitHub Action that runs daily, scrapes the E2E test logs, and publishes to a static GitHub Pages site. If we have to set up seperate hosting, a database, etc. then I agree that the overhead is probably too high and that it would become basically unmaintained. (I think gutenberg.run suffers from this.)

Screencasts: also not sure what value they provide over the setup cost. It doesn't seem like something crucial at this point or worth investing too much time into.

No real opinion on this one. I trust @kevin940726 😛

annezazu · 2024-03-21T20:18:46Z

Considering this hasn't been scoped down further and hasn't had much traction in a few years, I'm going to close this out but welcome folks to either reopen or start a new issue with more relevant/recent info about improving E2E test developer experience.

noisysocks added [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. [Type] Overview Comprehensive, high level view of an area of focus often with multiple tracking issues labels Jul 19, 2021

noisysocks assigned noisysocks and kevin940726 Jul 19, 2021

kevin940726 mentioned this issue Aug 2, 2021

E2E tests dashboard #33809

Open

kevin940726 mentioned this issue Nov 21, 2022

Speed up npm ci by caching node_modules #45932

Merged

annezazu closed this as completed Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for improving E2E test developer experience #33532

Ideas for improving E2E test developer experience #33532

noisysocks commented Jul 19, 2021 •

edited by kevin940726

Loading

gziolo commented Aug 2, 2021

kevin940726 commented Aug 2, 2021

mtias commented Aug 5, 2021

gziolo commented Aug 6, 2021

kevin940726 commented Aug 7, 2021 •

edited

Loading

vcanales commented Aug 10, 2021

kevin940726 commented Aug 10, 2021

noisysocks commented Aug 11, 2021 •

edited

Loading

annezazu commented Mar 21, 2024

Ideas for improving E2E test developer experience #33532

Ideas for improving E2E test developer experience #33532

Comments

noisysocks commented Jul 19, 2021 • edited by kevin940726 Loading

gziolo commented Aug 2, 2021

kevin940726 commented Aug 2, 2021

mtias commented Aug 5, 2021

gziolo commented Aug 6, 2021

kevin940726 commented Aug 7, 2021 • edited Loading

vcanales commented Aug 10, 2021

kevin940726 commented Aug 10, 2021

noisysocks commented Aug 11, 2021 • edited Loading

annezazu commented Mar 21, 2024

noisysocks commented Jul 19, 2021 •

edited by kevin940726

Loading

kevin940726 commented Aug 7, 2021 •

edited

Loading

noisysocks commented Aug 11, 2021 •

edited

Loading