Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPX-90398 netplan apply run by cloud-init fails on boot in OCI #475

Conversation

palash-gandhi
Copy link
Contributor

@palash-gandhi palash-gandhi commented Mar 25, 2024

Problem

Recent cloud-init upstream changes changed the way a "datasource" is detected. A "datasource" is a cloud-init term that is used to detect and run cloud-specific code. The fact that cloud-init longer detects the correct datasource causes DCoA issues because the SSH keys it relies on are no longer configured on the engine.
I filed an issue upstream canonical/cloud-init#5091 and was advised that we should consider using the newer Oracle datasource. In this repo, we have hardcoded cloud-init configuration that causes the OpenStack datasource to be used.

Solution

Remove this configuration so that cloud-init detects Oracle cloud correctly.
Replace it such that the Oracle data source is used based on feedback from upstream maintainers.

Testing Done

In progress

ab-pre-push -p oci: http://selfservice.jenkins.delphix.com/job/appliance-build-orchestrator-pre-push/8177/
oci-snapshots with these bits: http://selfservice.jenkins.delphix.com/job/delphix-build-and-snapshots/job/oci-snapshots/8/console

Verified datasource detection in the cloud-init logs:

2024-03-26 21:49:43,674 - util.py[DEBUG]: Cloud-init v. 23.4.4-0ubuntu0~20.04.1+delphix.2024.03.08.20.10 finished at Tue, 26 Mar 2024 21:49:43 +0000. Datasource DataSourceOracle.  Up 35.76 seconds

On internal variants, verified cloud-init configuration and that authorized_keys now contains my SSH key:

delphix@pg-oci-cloud-init-1-nic-1:~$ cat /run/cloud-init/cloud.cfg
datasource_list: [ Oracle, None ]

delphix@pg-oci-cloud-init-1-nic-1:~$ cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3

On an external variant, verified cloud-init configuration and that authorized_keys is empty:

delphix@pg-oci-cloud-init-ext-nic-1:~$ get-appliance-variant
external-standard

delphix@pg-oci-cloud-init-ext-nic-1:~$ cat /run/cloud-init/cloud.cfg
datasource_list: [ Oracle, None ]

$ sudo grep allow_public_ssh_keys /var/log/cloud-init.log
2024-03-26 22:10:59,502 - cc_ssh.py[DEBUG]: Skipping import of publish SSH keys per config setting: allow_public_ssh_keys=False

delphix@pg-oci-cloud-init-ext-nic-1:~$ cat ~/.ssh/authorized_keys
delphix@pg-oci-cloud-init-ext-nic-1:~$

@palash-gandhi palash-gandhi force-pushed the dlpx/pr/palash-delphix/e26471a2-3aa3-4808-bddd-51701595995f branch from 5b4b9c1 to a353dcc Compare March 25, 2024 17:06
@palash-gandhi palash-gandhi marked this pull request as ready for review March 25, 2024 20:43
@sebroy
Copy link
Contributor

sebroy commented Mar 25, 2024

This looks fine to me. One thing nags at me, though:
Do we have a test that verifies that when deploying an external variant, one cannot set a default user with a pre-set ssh key? If not, is there a way to manually test that? Whenever we change cloud-init configurations, this is the thing that I worry we might inadvertently break.

@blackboxsw
Copy link

Thanks again for filing the bug upstream and referencing that issue in this PR. From the looks of your repo, it seems you have the ability to lay down files specifically for OracleCloud in your files/oci subdir and this PR removes that file. If you have that capacity to opinonate cloud-init datasource_list per platform, it does shave off a tiny bit of boot time to provide that specific datasource_list: [ Oracle ] in an /etc/cloud/cloud.cfg.d/99*.cfg file as ds-identify in systemd generator timeframe with short-circuit some checks and mandate that Oracle or Openstack of whatever you choose is selected without doing any other platform checks in DMI data via Sysfs. Just a FYI either way.

@palash-gandhi palash-gandhi force-pushed the dlpx/pr/palash-delphix/e26471a2-3aa3-4808-bddd-51701595995f branch from a353dcc to a18dd5b Compare March 26, 2024 19:00
Copy link
Contributor

@prakashsurya prakashsurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, I can't comment on using OpenStack vs. Oracle.. but if Oracle is the right value to use, the change looks good..

I don't know of we have a negative test like seb's asking..

@palash-gandhi
Copy link
Contributor Author

This looks fine to me. One thing nags at me, though: Do we have a test that verifies that when deploying an external variant, one cannot set a default user with a pre-set ssh key? If not, is there a way to manually test that? Whenever we change cloud-init configurations, this is the thing that I worry we might inadvertently break.

@sebroy I've run this test via DCoA by cloning an external variant with my bits and verifying that authorized_keys is empty.

@palash-gandhi
Copy link
Contributor Author

Thanks again for filing the bug upstream and referencing that issue in this PR. From the looks of your repo, it seems you have the ability to lay down files specifically for OracleCloud in your files/oci subdir and this PR removes that file. If you have that capacity to opinonate cloud-init datasource_list per platform, it does shave off a tiny bit of boot time to provide that specific datasource_list: [ Oracle ] in an /etc/cloud/cloud.cfg.d/99*.cfg file as ds-identify in systemd generator timeframe with short-circuit some checks and mandate that Oracle or Openstack of whatever you choose is selected without doing any other platform checks in DMI data via Sysfs. Just a FYI either way.

Thanks for the pointer @blackboxsw. I've implemented your suggestion

@palash-gandhi palash-gandhi merged commit 0b5db55 into develop Mar 28, 2024
21 checks passed
@palash-gandhi palash-gandhi deleted the dlpx/pr/palash-delphix/e26471a2-3aa3-4808-bddd-51701595995f branch March 28, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants