Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rush] Implementation of Cobuild feature #3949

Merged
merged 117 commits into from
Sep 1, 2023

Conversation

chengcyber
Copy link
Contributor

@chengcyber chengcyber commented Feb 3, 2023

Summary

Closes #3485

Details

  1. Build Cache restores build log file, supports restoring failing build [rush] Restore log files when cache hit #3886
  2. Instead of running operation tasks after topological sorting, OperationManager should involve a new scheduler to pick the next-best operation to perform and store the operation to a list of pending operations being built by a cobuild.
  3. OperationExecutionRecord needs a new mechanism to periodically (every 10s) update status to remote server, such renewing cobuild lock)
  4. CobuildLockConfiguration reads from a config file, gets whether Cobuild feature is enabled or not, the context id pattern, get the cobuildLockProvider registered by rush plugin.
  5. A new Rush.js plugin which officially provides Redis implemented CobuildLock Provider.
  6. A integration test combining cobuild feature and the redis cobuild plugin
  7. [ ] Add a new lifecycle to PhasedCommandHooks, which receives current results from operationManager. And periodically called for reporting current progress from a global view. The use case would be like sending operation status to a data server, and repo maintainer can build a cobuild dashboard with the realtime data. (will be another MR.

How it was tested

  1. Unit tests added
  2. Added a new build test project called rush-redis-cobuild-plugin-integration-test
    a. test redisLockProvider
    b. test cobuild for sandbox repo via running VS Code tasks.

Preparation

Integration test folder: build-tests/rush-redis-cobuild-plugin-integration-test

cd build-tests/rush-redis-cobuild-plugin-integration-test
rush update
rush build -t .
docker compose up -d

Sandbox repo folder: build-tests/rush-redis-cobuild-plugin-integration-test/sandbox/repo

cd sandbox/repo
rush update

Case 1: Normal build, Cobuild is disabled because of missing RUSH_COBUILD_CONTEXT_ID

  1. Write to build cache
rm -rf common/temp/build-cache && node ../../lib/runRush.js --debug cobuild
  1. Read from build cache
node ../../lib/runRush.js --debug cobuild

Expected behavior: Cobuild feature is disabled. Build cache is saved/restored as normal.

Case 2: Cobuild enabled by specifying RUSH_COBUILD_CONTEXT_ID and Redis authentication

  1. Clear redis server
(cd ../.. && docker compose down && docker compose up -d)
  1. Run cobuilds
rm -rf common/temp/build-cache && RUSH_COBUILD_CONTEXT_ID=foo REDIS_PASS=redis123 RUSH_COBUILD_RUNNER_ID=runner1 node ../../lib/runRush.js --debug cobuild

Expected behavior: Cobuild feature is enabled. Run command successfully.
You can also see cobuild related logs in the terminal.

Running cobuild (runner foo/runner1)
Analyzing repo state... DONE (0.11 seconds)

Executing a maximum of 10 simultaneous processes...

==[ b (build) ]====================================================[ 1 of 9 ]==
Get completed_state(cobuild:completed:foo:2e477baf39a85b28fc40e63b417692fe8afcc023)_package(b)_phase(_phase:build): SUCCESS;2e477baf39a85b28fc40e63b417692fe8afcc023
Get completed_state(cobuild:completed:foo:cfc620db4e74a6f0db41b1a86d0b5402966b97f3)_package(a)_phase(_phase:build): SUCCESS;cfc620db4e74a6f0db41b1a86d0b5402966b97f3
Successfully acquired lock(cobuild:lock:foo:4c36160884a7a502f9894e8f0adae05c45c8cc4b)_package(b)_phase(_phase:build) to runner(runner1) and it expires in 30s

Case 3: Cobuild enabled, run two cobuild commands in parallel

Note: This test requires Visual Studio Code to be installed.

  1. Open predefined .vscode/redis-cobuild.code-workspace in Visual Studio Code.

  2. Clear redis server

# Under rushstack/build-tests/rush-redis-cobuild-plugin-integration-test
docker compose down && docker compose up -d
  1. Clear build cache
# Under rushstack/build-tests/rush-redis-cobuild-plugin-integration-test/sandbox/repo
rm -rf common/temp/build-cache
  1. Open command palette (Ctrl+Shift+P or Command+Shift+P) and select Tasks: Run Task and select cobuild.

In this step, two dedicated terminal windows will open. Running rush cobuild command under sandbox repo respectively.

Expected behavior: Cobuild feature is enabled, cobuild related logs out in both terminals.

Case 4: Cobuild enabled, run two cobuild commands in parallel, one of them failed

Note: This test requires Visual Studio Code to be installed.

  1. Open predefined .vscode/redis-cobuild.code-workspace in Visual Studio Code.

  2. Making the cobuild command of project "A" fails

sandbox/repo/projects/a/package.json

  "scripts": {
-   "_phase:build": "node ../build.js a",
+   "_phase:build": "exit 1",
  }
  1. Clear redis server
# Under rushstack/build-tests/rush-redis-cobuild-plugin-integration-test
docker compose down && docker compose up -d
  1. Clear build cache
# Under rushstack/build-tests/rush-redis-cobuild-plugin-integration-test/sandbox/repo
rm -rf common/temp/build-cache
  1. Open command palette (Ctrl+Shift+P or Command+Shift+P) and select Tasks: Run Task and select cobuild.

Expected behavior: Cobuild feature is enabled, cobuild related logs out in both terminals. These two cobuild commands fail because of the failing build of project "A". And, one of them restored the failing build cache created by the other one.

Impacted documentation

A brand new experimental feature!

@chengcyber chengcyber force-pushed the feat-cobuild branch 2 times, most recently from 94cbbbd to dfc0456 Compare February 16, 2023 08:20
Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way too much is happening in ShellOperationRunner that has nothing to do with its task of "execute a shell command and log its stdout/stderr". Cobuilds should be a wrapper, not part of ShellOperationRunner (so, for that matter, should the build cache, so that we get cache support for other runners).

Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more design feedback.

@elliot-nelson
Copy link
Collaborator

This is looking really nice.

I have a comment about configuration here -- I wonder if there's too much (potentially unneeded) functionality.

Here's some (admittedly opinionated) suggestions:

  1. First, need a config file to turn on the "experimental feature" of cobuilds, and specify what Cobuild Provider to use (all this makes sense).
  2. However, the ability to create a "cobuild context id pattern" in config file is more confusing than helpful. A cobuild only makes sense in the context of a large piece of orchestration (like a CI pipeline), which is going to decide how many "copies of rush" to start, and what the "context id" should be for each. I think we want that orchestrator to be in control and not a config file.
  3. I think the orchestrator (or local user) should always have to opt into running with cobuilds by providing a context id (as a command line arg or an environment variable). Even if cobuilds are "enabled" as an experimental feature, they don't actually turn on for that particular build unless the user is providing a context id.
  4. Is user/pass in a config file for the Redis Provider reasonable? I guess we can offer it as an option, but I think it would never be used in a real CI setup (the URL and pass will be CI secrets, or maybe even unknown -- e.g. you might log into AWS role and do some RBAC setup before running rush build to get the user/pass for Elasticache). So somehow this should be provided as another env var so CI can pass it in (perhaps using the full redis:// URL syntax).

Let me know what you think!

@chengcyber
Copy link
Contributor Author

chengcyber commented Mar 10, 2023

First, thanks you for reviewing this! @elliot-nelson

For 2 and 3, Do you mean that "context id" should be not configured in the "cobuid.json", instead, "context id" must be specified by environment variable ("RUSH_COBUILD_CONTEXT_ID") or CLI parameter(not designed yet)? If so, i am OK to remove the "context id pattern" part.

A further question from my side: Should RUSH_COBUILD_CONTEXT_ID='' be a valid value? Or it's the same meaning semantically as not specifying "context id"

For 4, Do you mean that the configuration for CobuildRedisPlugin should get the functionality to specify a env var? such as

rush-redis-cobuild-plugin.json

{
  "url": "redis://a.b.c.d:6379",
  "password": "${REDIS_PASS}"
} 

where REDIS_PASS is a env var from predefined variable in CI.

@iclanton
Copy link
Member

@chengcyber - I've come up with a repro for the issue we're seeing. Steps:

  1. Clone and build the main branch of the rushstack repo on Windows
  2. Update the "rushVersion" field in rush.json to "5.102.0-pr3949.4"
  3. Run rush build again

You should see output that looks like this:

Found configuration in E:\code\rushstack13\rush.json


Rush Multi-Project Build Tool 5.102.0-pr3949.4 - https://rushjs.io
Node.js version is 16.20.1 (LTS)

Found configuration in E:\code\rushstack13\rush.json

Starting "rush build"

Analyzing repo state... DONE (0.33 seconds)

Executing a maximum of 8 simultaneous processes...

==[ @rushstack/tree-pattern (build) ]============================[ 1 of 124 ]==
Error: EPERM: operation not permitted, copyfile 'E:\code\rushstack13\eslint\eslint-patch\rush-logs\eslint-patch._phase_build.log' -> 'E:\code\rushstack13\eslint\eslint-patch\.rush\temp\operation\_phase_build\all.log'
rush build - Errors! (0.58 seconds)

Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For safety, please also update OperationMetadataManager#tryRestoreAsync to ensure that control cannot exit the function without cleaning up logReadStream, e.g. by instantiating it outside of the try and cleaning it up in a finally block.

@chengcyber
Copy link
Contributor Author

Updated the error handling. The saved error is always printed in the terminal.

For instance, the previous error:

fHU3JIPzGS

@octogonz
Copy link
Collaborator

🚀 A prerelease of this branch is published as version 5.102.0-pr3949.5 of the following packages:

  • @microsoft/rush
  • @microsoft/rush-lib
  • @rushstack/rush-sdk
  • @rushstack/rush-amazon-s3-build-cache-plugin
  • @rushstack/rush-azure-storage-build-cache-plugin
  • @rushstack/rush-redis-cobuild-plugin

@chengcyber @elliot-nelson @iclanton

@iclanton iclanton merged commit e75d61a into microsoft:main Sep 1, 2023
5 checks passed
@chengcyber chengcyber deleted the feat-cobuild branch September 1, 2023 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[rush] Design: Rush "Cobuilds" - A cheap way to get distributed builds
6 participants