Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy Equinix Metal data facility closures on November 30th, 2022 #3028

Closed
4 tasks done
richardlau opened this issue Sep 8, 2022 · 14 comments
Closed
4 tasks done

Legacy Equinix Metal data facility closures on November 30th, 2022 #3028

richardlau opened this issue Sep 8, 2022 · 14 comments

Comments

@richardlau
Copy link
Member

richardlau commented Sep 8, 2022

We've been informed that Equinix are closing their legacy data facilities on November 30th, 2022. This will affect all of the instances we currently have hosted there and we'll need to move to "go-forward Equinix-based environments".

Currently we're spread over two projects -- Node.js and Works on ARM. The Linux ARM64 release machines are older ARM64 hardware (c2.large.arm) that are only in the legacy data facilities and are most at risk.

Node.js

host type desc status
release-equinix-ubuntu2004-docker-arm64-1 c2.large.arm Linux ARM64 release machine migrated to OSUOSL
release-equinix-ubuntu2004-docker-arm64-2 c2.large.arm Linux ARM64 release machine migrated to OSUOSL

Works on ARM:

host type desc status
test-equinix-ubuntu2004-arm64-1 x.large.arm Linux ARM64 test machine Migrated
test-equinix-ubuntu2004-arm64-3 x.large.arm Linux ARM64 test machine Migrated
test-packetnet-ubuntu1804-x64-1 c1.small.x86 jenkins-workspace-4 Migrated to Node.js project at Equinix
test-packetnet-ubuntu1804-x64-2 c1.small.x86 jenkins-workspace-5 Migrated to Node.js project at Equinix

Actions

@vielmetti
Copy link

Thanks @richardlau . Here's what I know so far.

The Jenkins machines should be the easiest to move of this bunch. We have a "c3.small" machine type which is of comparable size which should do the trick.

For the x.large.arm systems, the team is working on a migration process that will likely result in us doing physical machine migration on your behalf, moving the systems from DFW2 to one of the DA data centers. Some of those details are TBD, so as of right now I an awaiting instructions. In an ideal situation you would see a little bit of downtime for the physical move, but no need to reconfigure.

For the two release machines (c2.large.arm) my recommendation at the moment is to migrate those services to another provider. We are decommissioning the Ampere eMag hardware these live on, and those machines are not being moved to new facilities. Our newer c3.large.arm are very similar to the x.large.arm systems you currently are using through Works on Arm - large bare metal systems with high core counts - which makes them a poor fit to be deployed as release-only machines.

Happy to discuss any of this further (here, or in email, or by phone or conference call).

@mhdawson
Copy link
Member

mhdawson commented Sep 8, 2022

@vielmetti I think our challenge in terms of the suggestion for the release machines is that we don't have other providers with arm machines. One alternative is to try to use one larger machine and docker for the two release machines. Would it be possible to get one to do that?

@vielmetti
Copy link

@mhdawson In the time since we started this effort (at Packet), there have been a number of vendors that have announced and released arm64 support, many of which have done it with VMs that allow for the provider to offer up a smaller slice of a machine. AWS Graviton has been in the market for a while, and Google and Microsoft have made announcements.

Arm does have a program to enable access to these smaller systems for developers, an extension of the Works on Arm program described here

https://www.arm.com/solutions/infrastructure/works-on-arm

Please take a look around for some alternatives - as I say there are a lot more choices than there were just a few years ago! - some one of those may be economical or free to suit your needs.

@mhdawson
Copy link
Member

@vielmetti ok we'll try to explore other options.

richardlau added a commit that referenced this issue Sep 27, 2022
Add two new `jenkins-workspace` machines:
- test-equinix-ubuntu2204-x64-1
- test-equinix-ubuntu2204-x64-2

Refs: #3028
@richardlau
Copy link
Member Author

We have replacement arm64 release machines at OSUOSL now (#3051).

The remaining outstanding item here is the Altras (x.large.arm) in the test CI that I guess we're waiting to here further information on from Equinix.

@vielmetti
Copy link

Thanks @richardlau . When the replacement arm64 systems are 100% a-ok, can you go into the Equinix portal and delete/destroy the old systems that are no longer in service?

richardlau added a commit that referenced this issue Oct 13, 2022
Add two new OSUOSL hosted arm64 release machines to replace the
Equinix ones that are going away at the end of November 2022.

Updates the playbook for centos7 to drop devtoolset-6 which is no
longer required since Node.js 12 went End-of-Life (Node.js 14 and
16 use devtoolset-8 and Node.js 18 and later are not built on
centos7).

Refs: #3028
@richardlau
Copy link
Member Author

Thanks @richardlau . When the replacement arm64 systems are 100% a-ok, can you go into the Equinix portal and delete/destroy the old systems that are no longer in service?

@vielmetti I've deleted the old systems that have already been migrated (the two c2.large.arm's and the c1.small.x86's).

The remaining servers affected by the data facility closures are the two Altras (x.large.arm). Do we need to initiate something to get these machines moved to a data facility that is not being closed or will we be contacted with further details?

@richardlau
Copy link
Member Author

We got this email today:

Hello Team,

We would like to inform you that Equinix will be physically moving all the servers to a new data center, as the existing facility is getting closed.

This is planned to be done by Equinix in the week starting 14th Nov 2022 (in US business hours) and the expected downtime is 4 days. We expect all the severs to be up and ready for use by end of day four unless team encounters some technical issues.

Important points to be noted:

This physical movement of servers will take up to 4 calendar days and we will inform when your machines get back online within the 4-day window.
Some or all of the IP's on the server may change as part of this move. If so, we will need you to log on to your instance, either via direct SSH or through the Equinix SOS portal, and change your network config to match the new values.
Equinix Ops team should not have to remove the instances from the servers, and they should be as is when back online. But we advise you to plan backup of project data before the planned date, just in case something goes wrong, and data is lost.
Please shut down the instance by EOD 13th Nov, as Equinix team will need to power off the systems.

We do understand that this may cause some inconvenience to you, we truly appreciate your cooperation. Please let us know in case you have any questions.

Please acknowledge receipt of this email.

Regards

WoA Program Team

@mhdawson
Copy link
Member

mhdawson commented Nov 2, 2022

@richardlau I see you also acknowledge. I assume the main thing we should do is plan to avoid any releases over that 4 day window if possible.

richardlau added a commit that referenced this issue Nov 3, 2022
Remove from the inventory machines that were hosted at Packet/Equinix
that have subsequently been migrated to other machines at Equinix and
OSUOSL.

Refs: #3028
Refs: #2729
richardlau added a commit to richardlau/build that referenced this issue Nov 13, 2022
@richardlau
Copy link
Member Author

I've powered off the two Altras at Equinix ahead of the migration as requested.

One test host has been created at OSUOSL so we should still be able to run CI but's it is a much smaller machine than the Altra's so is likely to be slower.

@targos
Copy link
Member

targos commented Nov 15, 2022

I think we now have a bottleneck in CI (there's a large build queue with only ARM jobs).

@vielmetti
Copy link

@targos and all

Please continue to review https://www.arm.com/solutions/infrastructure/works-on-arm for information about free Arm credits and build resources on a variety of clouds. I am certain that additional builder cycles are free for the asking through this program from multiple sources.

@richardlau
Copy link
Member Author

I think we now have a bottleneck in CI (there's a large build queue with only ARM jobs).

Builds are going to be slower because the OSUOSL docker host has much less resources than the two Ampere Altras that are currently offline while they are being moved to another data center.

I've disabled https://ci.nodejs.org/job/node-test-commit-arm-debug/ for now as this build in particular is now taking between 2.5-11 hours and tying up the two ubuntu1804-arm64 executors we have which are also used in https://ci.nodejs.org/job/node-test-commit-arm/ builds.

@richardlau
Copy link
Member Author

The two Altras have been migrated.
test-equinix-ubuntu2004-arm64-1 failed to restart automatically and was tuck on the UEFI boot prompt (#2894).
test-equinix-ubuntu2004-arm64-3 appears to have restarted without requiring intervention.

I've edited labels for https://ci.nodejs.org/job/node-test-commit-arm-debug/ so that it doesn't run on the slower OSUOSL host. I've also edited the job so it runs in the Ubuntu 20.04 containers instead of the 18.04 ones and reenabled the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants