Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of git clones in the test framework #1686

Merged
merged 5 commits into from
May 22, 2019

Conversation

jtgeibel
Copy link
Member

The test framework spends a lot of I/O and allocations initializing and
cloning git repositories, even though most tests do not enqueue a
background job to modify the index. A TestApp::full() method is
added which initializes the TestApp with an index, background job
runner, and playback proxy.

It is also now possible to enqueue multiple crate publishes before
running a batch of background jobs. A Drop impl on TestAppInner
ensures that all pending migrations are run and that no jobs have been
queued unless TestApp::full() was used.

Unlike the enqueue_publish helper the yank and unyank methods
automatically run migrations out of convenience since otherwise the
database is not updated. This can be changed in the future if a test
cares about controlling exactly when the jobs are run.

Finally, this PR includes several other improvements:

  • The initial commit for each remaining upstream index is now made
    directly against the bare repo, rather than doing a clone and push.
  • The proxy only initializes a Client when recording.
  • A playback proxy is not created when calling app()

Overall, when running the tests in all, these changes reduce the
cumulative allocation size from 3.45 GB to 689 MB. Allocation counts
are reduced from 4,625,804 to 1,611,351.

@sgrif
Copy link
Contributor

sgrif commented Mar 28, 2019

To me this feels like it leaks too much of the app's internals into the integration tests. They shouldn't need to care about whether the request they just made enqueued some background jobs or not -- running them synchronously in tests is pretty common across all frameworks. If the problem we're trying to solve is not performing a git clone for tests that don't need them, would just adding if pending_job_count == 0 { return; } to the middleware fix this?

@sgrif
Copy link
Contributor

sgrif commented Mar 28, 2019

Also can you shed more light on why we're focusing on reducing allocations when running tests? Is this causing problems somewhere that I'm not aware of?

@bors
Copy link
Contributor

bors commented Mar 28, 2019

☔ The latest upstream changes (presumably #1699) made this pull request unmergeable. Please resolve the merge conflicts.

@jtgeibel
Copy link
Member Author

Also can you shed more light on why we're focusing on reducing allocations when running tests? Is this causing problems somewhere that I'm not aware of?

These changes originated from my attempts to profile memory usage. While the tests in all.rs don't exercise the HTTP server(s), they do cover a broad set of functionality and orchestrate the application in a convenient and reproducible way (such that, for instance, requests aren't actually sent to S3 or GitHub). After analyzing and reducing the allocations via heaptrack, it is now practical to run the test suite in valgrind as well. (Note that to run either of these tools, it is also necessary to temporarily remove the jemalloc related crates and initialization code.)

While trimming down the test harness, I was able to find and extract the improvements in #1692, #1682, and #1669.

To me this feels like it leaks too much of the app's internals into the integration tests.

I see that this seems to be coupling tests to the implementation details of individual endpoints, but I see it as each test orchestrating the set of high-level components that are needed to test a piece of the site's functionality. Of the 178 tests here, 144 of them don't require anything more than a "web" instance and the database. These could potentially live in the main library, as they just use the test harness to optionally initialize users/tokens and inject MockRequests.

There are only 34 tests which use the test harness to initialize the playback server. Here the test are already coupled to the implementation because there needs to be a proxy recording in http-data. Of these, only 26 use full() to initialize the index and configure a background worker for that test.

They shouldn't need to care about whether the request they just made enqueued some background jobs or not

Beyond remembering to initialize the background worker (and this mistake is detected and provides an error message explaining what to do), the individual tests don't necessarily need to be aware of this (as with yank/unyank here). The test harness could automatically drain the queue after all requests, but I'd eventually like to add tests that inject some failures (such as an independent upstream push that forces the background worker to rebase). When the worker is always run in the application middleware it isn't possible for the test harness to provide finer control over the sequence of events.

If the problem we're trying to solve is not performing a git clone for tests that don't need them, would just adding if pending_job_count == 0 { return; } to the middleware fix this?

That might help a bit, but the test harness would still be initializing an upstream repository for every test, assuming that the worker might need to be run. Plus I think there are other advantages to moving this into the test harness as mentioned above.

@bors
Copy link
Contributor

bors commented Apr 1, 2019

☔ The latest upstream changes (presumably #1677) made this pull request unmergeable. Please resolve the merge conflicts.

The test framework spends a lot of I/O and allocations initializing and
cloning git repositories, even though most tests do not enqueue a
background job to modify the index.  A `TestApp::full()` method is
added which initializes the `TestApp` with an index, background job
runner, and playback proxy.

It is also now possible to enqueue multiple crate publishes before
running a batch of background jobs.  A `Drop` impl on `TestAppInner`
ensures that all pending migrations are run and that no jobs have been
queued unless `TestApp::full()` was used.

Unlike the `enqueue_publish` helper the `yank` and `unyank` methods
automatically run migrations out of convenience since otherwise the
database is not updated.  This can be changed in the future if a test
cares about controlling exactly when the jobs are run.

Finally, the initial commit for each remaining upstream index is now
made directly against the bare repo, rather than doing a clone and
push.
There is no reason to initialize this client when in playback mode.
The remaining 12 tests that call `app()` directly do not need a proxy.
This list was obtained by temporarily causing playback to panic if
there was no `http-data` file when the proxy is started.

The only remaining test that starts a proxy server which goes unused is
`krate::yank_not_owner`, however this test does need an index.  If more
tests fall into this category over time we can switch to a full builder
API for orchestrating a `TestApp` but that doesn't seem necessary yet.
@carols10cents
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented May 22, 2019

📌 Commit 85ee13e has been approved by carols10cents

@bors
Copy link
Contributor

bors commented May 22, 2019

⌛ Testing commit 85ee13e with merge 960ced1...

bors added a commit that referenced this pull request May 22, 2019
 Reduce the number of git clones in the test framework

The test framework spends a lot of I/O and allocations initializing and
cloning git repositories, even though most tests do not enqueue a
background job to modify the index.  A `TestApp::full()` method is
added which initializes the `TestApp` with an index, background job
runner, and playback proxy.

It is also now possible to enqueue multiple crate publishes before
running a batch of background jobs.  A `Drop` impl on `TestAppInner`
ensures that all pending migrations are run and that no jobs have been
queued unless `TestApp::full()` was used.

Unlike the `enqueue_publish` helper the `yank` and `unyank` methods
automatically run migrations out of convenience since otherwise the
database is not updated.  This can be changed in the future if a test
cares about controlling exactly when the jobs are run.

Finally, this PR includes several other improvements:

* The initial commit for each remaining upstream index is now made
  directly against the bare repo, rather than doing a clone and push.
* The proxy only initializes a `Client` when recording.
* A playback proxy is not created when calling `app()`

Overall, when running the tests in `all`, these changes reduce the
cumulative allocation size from 3.45 GB to 689 MB.  Allocation counts
are reduced from 4,625,804 to 1,611,351.
@bors
Copy link
Contributor

bors commented May 22, 2019

☀️ Test successful - checks-travis
Approved by: carols10cents
Pushing 960ced1 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants