Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase RAM of the Ubuntu.2204.Amd64.Open queue definition #1163

Closed
7 tasks
riarenas opened this issue Oct 25, 2023 · 8 comments
Closed
7 tasks

Increase RAM of the Ubuntu.2204.Amd64.Open queue definition #1163

riarenas opened this issue Oct 25, 2023 · 8 comments
Assignees
Labels
Ops - Facilely Operations issues that are easily accomplished or attained

Comments

@riarenas
Copy link
Member

riarenas commented Oct 25, 2023

Per request from the Runtime team, some of their tests require at least 16gb of ram.

Details in this internal teams conversation

  • Create a new queue entry specifically for the native aot tests, based on the ubuntu.2204.amd64(.open) queue definitions
  • These queues should use D4a_v4 machines, by specifying a mincores: 4 requirement.
  • We only really need a .rt version of this queue that runs in the runtime Azure subscription. This should be specified in the definition as seen here
  • Once the queue is in staging, contact @kotlarmilos with instructions on how they can point their jobs to the new queue in staging.

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

Added Ubuntu.2204.Amd64-Extra-Large(.Open).rt queue with 4 cores instead of 2

@kotlarmilos
Copy link
Member

kotlarmilos commented Oct 26, 2023

Thank you. Providing more context here:

  • The job consistently hit OOM issue after 2h when AOT compiling the System.Private.CoreLib.dll when running a docker container based on the cbl-mariner-2.0-cross-amd64 image
  • The current VMs have a limit on docker containers RAM consumption
  • The same job finishes the compilation successfully within 10min on machine with 32gb of RAM
  • The job belongs to extra-platforms and will not be executed on each PR
  • We don't want to migrate other jobs to the upgraded VMs in order to maintain costs

@riarenas riarenas added the Needs Triage A new issue that needs to be associated with an epic. label Oct 30, 2023
@missymessa
Copy link
Member

Let's create a new queue for this work.

@missymessa missymessa added Ops - Facilely Operations issues that are easily accomplished or attained and removed Needs Triage A new issue that needs to be associated with an epic. labels Nov 2, 2023
@kotlarmilos
Copy link
Member

Thank you! It would be good to test it before setting up the entire queue. Can we test it on a single machine in the queue before adding more machines?

@riarenas
Copy link
Member Author

riarenas commented Nov 3, 2023

Yep. that should be possible! once we have our queue in the staging environment, you could try targeting that queue so that we can validate before attempting to deploy to production. We can give you the details once the queue is available for testing.

note to implementer: I updated the description with hopefully concise details on what we need to do.

@kotlarmilos
Copy link
Member

Thanks!

/cc: @steveisok @SamMonoRT

@AlitzelMendez AlitzelMendez self-assigned this Nov 21, 2023
@AlitzelMendez
Copy link
Member

PR: https://dev.azure.com/dnceng/internal/_git/dotnet-helix-machines/pullrequest/35422

@AlitzelMendez
Copy link
Member

this has rolled out :) feel free to reopen if you haver any questions

@kotlarmilos
Copy link
Member

kotlarmilos commented Dec 8, 2023

Thank you! Just to confirm, does it also come with increased RAM? According to the release message, I noticed a queue with 4 cores instead of 2. Our bottleneck has been the high consumption of RAM memory. The AOT compiler consumes 16GB of RAM at its peak, but as it runs within a docker container, we are limited to 50%, which is 8GB of RAM. We have two potential solutions: either we remove the docker limit for RAM or utilize a queue with 32GB of RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ops - Facilely Operations issues that are easily accomplished or attained
Projects
None yet
Development

No branches or pull requests

4 participants