-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regular & visible ansible refreshes of machines #695
Comments
This is something we are looking into at the OpenJ9 CI as well so there's possibility for collaboration or reusing solutions here. eclipse-openj9/openj9#4221 |
Thanks Adam, I was going to post that as a related effort. And was going to post to that issue the question about difference between using Jenkins ansible plugins & schedule versus Tower approach. |
AWX should be rolling out updates regularly - I suspect there is a bug. |
While refreshing regularly is a goal, we're not at the stage where they're stable enough to do it on a regular basis, and that has to be a prereq. We're working on it, but bear in mind we're currently getting very regular requests for new types of systems therefore ensuring we have an infrastructure capable of testing changes before they're deployed in production is critical to ensuring that visible ansible refreshes of production machines doesn't break anything We're a lot closer than we were a couple of months ago but I would not advocate putting this in place right now as I believe the risk would be too great. Current issue list: https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues?q=is%3Aissue+is%3Aopen+label%3Aansible |
For reference, my plans on this are to start running them manually on subsets of machines, ensure that they are running "green" (many haven't been and I've done a pile of work under the infra repo to get them closer - we're almost there on xlinux I think, but other things like builds keep getting in the way!) This is the best way to understand any stability problems. Then start running the schedules for them automatically in AWX. Bear in mind that at present it's still just me (Windows excepted) really working on playbook stabilisation. |
@sxa555 - apologies, I understood you and Husain were getting very close to playbook goodness, so thought it was timely to propose. I know you are holding the fort on this work (which I greatly appreciate). Please let me know if there are any small tasks we can help with... (tricky I know due to permissions, etc). |
No need to apologise - I want to get there as much as you do :-) |
Shelley has confirmed that my re-run on the first machine has gone cleanly and resolved the issue, so I will be continuing to redeploy on other systems - I'll update this comment as and when each one is done.
test-softlayer-rhel69-x64-1 not yet done as it's subject to #698 |
And I will mentioned I verified that I could still run openjdk regression tests on test-packet-ubuntu1604-x64-3, after its refresh |
Aha - this is the issue I was missing. Right - I'm happy to pair with @sxa555 and get through this as well. Got stuck on s390 and ppcle with docker. |
s390x and ppc64le now resolved as per #714 ... Now running on all UNIX |
Several machines failed due to the issue addressed by #729
There were a few issues with machines being unreachable (armv7 offline, others likely temporary)
And a few of special case failures: Now running on a subset of the test machines ( |
Thanks for the efforts in getting us to green @sxa555 ! |
|
|
A quick breakdown of issues by platform that need to be addressed before getting the playbooks to a state where they can run regularly without error Mac x64 No remaining issues with ppc64le and s390x Windows, AIX, Solaris and aarch64 still need addressing |
Update: Scheduled playbook deployment is up and running for AIX, Solaris and aarch64. Some outstanding issues remain with AIX #3086 Windows is nearly finished, just waiting on final actions regarding credentials for windows machines |
|
Yes, all of the platforms are now running on a schedule |
Now that the playbooks are more stable, it would be good to have regular and visible machine refreshes, to ensure that new updates to the playbooks will be picked up and deployed on a regular basis. (related: #624 submitted 2 weeks ago, and needs deployment to test machines). In addition to 'full set of machines' refreshes, for one-off updates to a particular machine, it should also be made known/visible, and part of some easy to find communication.
Benefits include faster test triage and easier on-boarding of new helpers to the infra team. Visibility to all interested parties.
At present, what are the tools used for deployment, Ansible Tower? (not visible to non-infra folks). I ask because this request for scheduled/visible machine refreshes could possibly be addressed using the ansible plugins for Jenkins and scheduling a set of infra jobs to run regularly. These jobs would then be visible to more than the infra team, and the infra tasks would be dealt with similarly to the build and test tasks. But maybe Ansible tower gives other benefits, which would be good to understand (as its at the cost of visibility/transparency).
I know its already been discussed by infra and was possibly already in plan, so if this is already being done, please point me to it, I will like to help.
The text was updated successfully, but these errors were encountered: