Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dind: Ensure unique node config and fix perms #11326

Merged
merged 2 commits into from
Oct 14, 2016

Conversation

marun
Copy link
Contributor

@marun marun commented Oct 11, 2016

A dind deployment flake on #11315 revealed that config permissions are not being set correctly to allow cluster artifacts to be saved. This change ensures the cluster configuration is readable so it can be saved.

cc: @openshift/networking @stevekuznetsov

Should also fix #11274

@marun
Copy link
Contributor Author

marun commented Oct 11, 2016

[test]

@marun
Copy link
Contributor Author

marun commented Oct 11, 2016

This fixes more than just permissions. Previously node configuration was always generated into the same path on the shared volume - /data/openshift.local.config - instead of a node-specific path - /data/openshift.local.config/node-*. This was likely resulting in race conditions where one or more nodes could end up using the same configuration.

@marun marun changed the title dind: fix config permissions dind: Ensure unique node config and fix perms Oct 11, 2016
find "${config_path}" -exec chmod ga+rw {} \;
find "${config_path}" -type d -exec chmod ga+x {} \;
# ensure the configuration can be used outside of the container
chmod ga+rx "${config_path}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we collapse this:

chmod -R ga+rX "${config_path}"
chmod ga+w "${master_path}/admin.kubeconfig"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@stevekuznetsov
Copy link
Contributor

chmods look good. Do we want to merge both this one and #11298 and see which goes in first? Or close that other and one focus on this one?

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marun
Copy link
Contributor Author

marun commented Oct 11, 2016

@stevekuznetsov Ok, I'll close the other one. I tend to try to split things up but these fixes are very closely related.

@@ -51,6 +58,9 @@ function ensure-node-config() {
--signer-cert="${master_config_path}/ca.crt" \
--signer-key="${master_config_path}/ca.key" \
--signer-serial="${master_config_path}/ca.serial.txt"

# Release the lock
flock -o
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flock(1) is so weird... pretty sure this is wrong though. At the least it would need a "200" there so it knows what to unlock? And I think you want -u not -o too.

I think the recommended way of using it is to scope it with a subshell:

(flock 200;
 /usr/local/bin/openshift ...
) 200>/path/to/lockfile

so then the fd is closed automatically when leaving the subshell.

@danwinship
Copy link
Contributor

lgtm i think. let's see what the tests say...

@marun
Copy link
Contributor Author

marun commented Oct 11, 2016

flake #8571

re-[test]

@marun
Copy link
Contributor Author

marun commented Oct 11, 2016

@danwinship well, the check job passed. shame about the others that have nothing to do with this PR.

@marun
Copy link
Contributor Author

marun commented Oct 12, 2016

flake #8571

re-[test]

@marun
Copy link
Contributor Author

marun commented Oct 12, 2016

@openshift/networking The flakes are unrelated, please merge.

@marun
Copy link
Contributor Author

marun commented Oct 12, 2016

flake #8571 re-[test]

@knobunc
Copy link
Contributor

knobunc commented Oct 12, 2016

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to c9fb01c

@marun
Copy link
Contributor Author

marun commented Oct 13, 2016

holy merge queue, batman

@knobunc
Copy link
Contributor

knobunc commented Oct 13, 2016

Flake #10663 [merge]

@stevekuznetsov
Copy link
Contributor

@knobunc the test job is running and failing, the merge has not yet occurred. I don't think re-triggering a merge will shove you to the end of the line, but it may be ... no need to re-tag

@knobunc
Copy link
Contributor

knobunc commented Oct 13, 2016

@stevekuznetsov Thanks, I wasn't sure if the test fail was going to prevent the merge.

@stevekuznetsov
Copy link
Contributor

the test fail was going to prevent the merge.

today the relationship is one-way in that successful tests (against a master that hasn't changed) can help you, but failed tests can never hurt you

@marun
Copy link
Contributor Author

marun commented Oct 14, 2016

[test] curious to see the networking job runtime

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to c9fb01c

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10059/) (Base Commit: 098e0ee)

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 14, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10072/) (Base Commit: 0a022ab) (Image: devenv-rhel7_5181)

@dcbw
Copy link
Contributor

dcbw commented Oct 14, 2016

lgtm

@openshift-bot openshift-bot merged commit 2142bdb into openshift:master Oct 14, 2016
@marun marun deleted the dind-fix-config-perms branch November 29, 2016 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

networking test flake "certificate is valid for nettest-node-1, 172.17.0.3, not nettest-node-2"
6 participants