Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Node deployment with bootstrap-in-place #4482

Merged
merged 3 commits into from
Feb 12, 2021

Conversation

eranco74
Copy link
Contributor

@eranco74 eranco74 commented Dec 15, 2020

single-node deployment with bootstrap-in-place

Add new create single-node-ignition-config command to the installer to create
bootstrap-in-place-for-live-iso.ign Ignition config.
This new target will not output master.ign and worker.ign.

This Ignition config will have a different bootkube.sh from the
default bootstrap Ignition. In addition to the standard rendering
logic, the modified script will:

  1. Start cluster-bootstrap without required pods by setting --required-pods=''
  2. Run cluster-bootstrap with the --bootstrap-in-place option.
  3. Fetch the master Ignition and combine it with the original Ignition
    config, the control plane static pod manifests, the required
    kubernetes resources, and the bootstrap etcd database snapshot to
    create a new Ignition config for the host.
  4. At this point bootkube finish and a new service named wrrite-to-disk start.
    This service will execute coreos-installer install to write the rhcos image
    and the rendered master Ignition to disk and reboot the node.

Log gathering on bootstrap-in-place.

Support collecting logs from bootstrap-in-place cluster with, for example, openshift-install gather bootstrap --bootstrap 192.168.126.10 --master 192.168.126.10 --key id_rsa

Pre-pivot - gather bootstrap only

The command works as usual before the pivot occurs because the /usr/local/bin/installer-gather.sh script is present and works as expected without any changes.

Post-pivot - gather bootstrap & master logs

Gathering bootstrap logs

Before reboot, bootstrap will gather from itself using /usr/local/bin/installer-gather.sh. This script also tries to gather from the masters in the cluster, but since there are none, it will only gather about itself.

After the gathering is complete, the bundle is added to the master ignition - thus making the bootstrap logs available from the master after reboot.

Gathering master logs

Typically, in non-BiP scenarios, masters logs are gathered using the bootstrap node as a proxy by remotely running, via ssh, the /usr/local/bin/installer-gather.sh script present on the bootstrap node. That script in turn detects all masters (or receives a list of masters via the --master command-line arguments), then for each master it scps a second script called /usr/local/bin/installer-masters-gather.sh to that master. It then runs that script remotely, on the master, using ssh. When the script is finished running, it scps the resulting files back to the bootstrap node, adding them to the bootstrap log bundle, with each master logs appearing in its own named-folder inside the control-plane directory of the bootstrap node log bundle.

In the BiP scenario, we created a new script called installer-master-bootstrap-in-place-gather.sh. This script is copied to the master (using ignition) to the same location where the bootstrap node usually has installer-gather.sh. i.e. this script masquerades as installer-gather.sh. We also copy the /usr/local/bin/installer-masters-gather.sh.

The installer-master-bootstrap-in-place-gather.sh, masquerading as installer-gather.sh, gets remotely called by the openshift-install gather command that believes its collecting logs from a bootstrap node. The script however behaves slightly differently - instead of collecting bootstrap logs and then remotely running /usr/local/bin/installer-masters-gather.sh on all master nodes, it collects the bootstrap logs from the bundle copied to via the master-ignition, and it collect master logs by running /usr/local/bin/installer-masters-gather.sh directly on itself instead of remotely on other masters (which don't exist. All master IP addresses passed to it are completely ignored). The final archiving of all the logs into the home directory is done exactly in the same manner as /usr/local/bin/installer-gather.sh.

The end result is a log bundle that looks pretty much the same as non-bootstrap-in-place scenarios.

Small caveat

As you can see - openshift-install gather bootstrap --bootstrap 192.168.126.10 --master 192.168.126.10 --key id_rsa - the command requires you to specify at least one master and one bootstrap node. In our case, you just pass the single node IP as both master and bootstrap. This causes a small "bug" when the node is pre-pivot - it collects both bootstrap logs on itself but also "remotely" collects master logs from itself via SSH. This causes a duplication of log files when the gather command is ran pre-pivot.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2020
@eranco74 eranco74 force-pushed the bootstrap-in-place branch 2 times, most recently from 4d679af to 3669ed0 Compare December 15, 2020 11:36
@eranco74 eranco74 mentioned this pull request Dec 15, 2020
@romfreiman
Copy link

Relevant cluster-bootstrap pr is: openshift/cluster-bootstrap#46

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2020
@eranco74 eranco74 force-pushed the bootstrap-in-place branch 2 times, most recently from 4dc4c9d to d2db289 Compare December 16, 2020 13:34
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 17, 2020
@eranco74 eranco74 force-pushed the bootstrap-in-place branch 2 times, most recently from 5d1a8a1 to b3eda08 Compare December 17, 2020 16:41
@openshift-merge-robot
Copy link
Contributor

openshift-merge-robot commented Dec 17, 2020

@eranco74: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/shellcheck b3eda08 link /test shellcheck
ci/prow/yaml-lint b3eda08 link /test yaml-lint
ci/prow/e2e-crc b3eda08 link /test e2e-crc
ci/prow/e2e-aws b3eda08 link /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@staebler
Copy link
Contributor

/assign

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any major qualms with this approach.

sno_manifest.yaml Outdated Show resolved Hide resolved
pkg/asset/ignition/bootstrap/bootstrap.go Outdated Show resolved Hide resolved
hack/after_reboot.sh Outdated Show resolved Hide resolved
@eranco74 eranco74 force-pushed the bootstrap-in-place branch 4 times, most recently from 4080605 to f21bc87 Compare December 20, 2020 16:19
@eranco74 eranco74 force-pushed the bootstrap-in-place branch 3 times, most recently from 74d33e0 to 3088a01 Compare February 11, 2021 21:41
Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann, staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2021
@romfreiman
Copy link

romfreiman commented Feb 11, 2021 via email

@romfreiman
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2021
@romfreiman
Copy link

/retest

1 similar comment
@eranco74
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@staebler
Copy link
Contributor

/lgtm cancel

Feb 11 22:04:34 ip-10-0-38-130 bootkube.sh[2178]: /usr/local/bin/bootkube.sh: line 348: syntax error near unexpected token `fi'

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2021
Added new asset for SingleNodeBootstrapInPlace

Updated bootkube to set BOOTSTRAP_INPLACE variable and evaluate it at run time.
This change allow someone looking at the script to follow the shell logic for
single-node vs. multi-node deployment while debugging.

When creating single-node-ignition-config we now validate
that the install-config contain configuration for bootstrapInPlace.
Added install-to-disk serivce that will complete the installation
by writing the OS to the desired instnallation disk and reboot the node.

Signed-off-by: Eran Cohen <eranco@redhat.com>
@romfreiman
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@staebler
Copy link
Contributor

 2021/02/12 05:00:32 Pod e2e-aws-ipi-install-install succeeded after 33m0s

🕺

@eranco74
Copy link
Contributor Author

/retest

@romfreiman
Copy link

/test e2e-aws

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2021

@eranco74: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-crc 2b9080c link /test e2e-crc
ci/prow/e2e-aws-single-node bbaf0ac link /test e2e-aws-single-node
ci/prow/e2e-aws-workers-rhel7 8e4a408 link /test e2e-aws-workers-rhel7
ci/prow/e2e-openstack 8e4a408 link /test e2e-openstack

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 1ad0158 into openshift:master Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.