Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emit warning on no liveness probe defined for pods #10363

Conversation

juanvallejo
Copy link
Contributor

@juanvallejo juanvallejo commented Aug 11, 2016

fixes #10271

This patch iterates through pod specs, listing specs with no liveness
probes set. Any replication controllers whose deployment is fulfilled by
a DeploymentConfig are ignored.

Example

$ oc create -f pkg/api/graph/test/simple-deployment
deploymentconfig "simple-deployment" created

$ oc get all
NAME                   REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/simple-deployment   1          1         0         config

NAME                     DESIRED   CURRENT   READY     AGE
rc/simple-deployment-1   0         0         0         1m

NAME                            READY     STATUS              RESTARTS
AGE
po/simple-deployment-1-deploy   0/1       ContainerCreating   0
1m

$ oc status -v
In project test on server https://10.111.123.56:8443

dc/simple-deployment deploys docker.io/openshift/deployment-example:v1
  deployment #1 pending 26 seconds ago

Warnings:
  * pod/simple-deployment-1-deploy has no liveness probe to verify pods are still running.
    try: oc set probe pod/simple-deployment-1-deploy --liveness ...
  * dc/simple-deployment has no liveness probe to verify pods are still running.
    try: oc set probe dc/simple-deployment --liveness ...

View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.

cc @fabianofranz @stevekuznetsov

@juanvallejo
Copy link
Contributor Author

[test]

func FindDeploymentConfigLivenessWarnings(g osgraph.Graph, f osgraph.Namer, setProbeCommand string) []osgraph.Marker {
markers := []osgraph.Marker{}

Node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named points almost always mean the code should be factored differently. Factor our your one large method into smaller methods to reduce cyclomatic complexity and as a result the need for this will go away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named points almost always mean the code should be factored differently. Factor our your one large method into smaller methods to reduce cyclomatic complexity and as a result the need for this will go away.

@stevekuznetsov is right.

@fabianofranz
Copy link
Member

cc @deads2k

// All of the containers in the deployment config lack a readiness probe
markers = append(markers, osgraph.Marker{
Node: uncastDcNode,
Severity: osgraph.WarningSeverity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabianofranz I'll defer to you about info vs warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for info.

@deads2k
Copy link
Contributor

deads2k commented Aug 12, 2016

Needs a test.

@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch from 2ecad3d to f5d49b6 Compare August 15, 2016 19:57
@juanvallejo
Copy link
Contributor Author

juanvallejo commented Aug 15, 2016

Needs a test.

Will add a unit test

This patch currently lists all pods without a --liveness probe, including replicationcontroller pods created by deploymentconfigs. It would be ideal to only list deploymentconfig pods in those cases.

Current output:

  * rc/mysql-master-1 has no liveness probe to verify pods are still running.
    try: oc set probe rc/mysql-master-1 --liveness ...
  * rc/mysql-slave-1 has no liveness probe to verify pods are still running.
    try: oc set probe rc/mysql-slave-1 --liveness ...
  * rc/node-1 has no liveness probe to verify pods are still running.
    try: oc set probe rc/node-1 --liveness ...
  * rc/ruby-1 has no liveness probe to verify pods are still running.
    try: oc set probe rc/ruby-1 --liveness ...
  * pod/mysql-slave-1-deploy has no liveness probe to verify pods are still running.
    try: oc set probe pod/mysql-slave-1-deploy --liveness ...
  * pod/ruby-1-nec4y has no liveness probe to verify pods are still running.
    try: oc set probe pod/ruby-1-nec4y --liveness ...
  * dc/mysql-master has no liveness probe to verify pods are still running.
    try: oc set probe dc/mysql-master --liveness ...
  * dc/mysql-slave has no liveness probe to verify pods are still running.
    try: oc set probe dc/mysql-slave --liveness ...
  * dc/node has no liveness probe to verify pods are still running.
    try: oc set probe dc/node --liveness ...
  * dc/ruby has no liveness probe to verify pods are still running.
    try: oc set probe dc/ruby --liveness ...

Any feedback / help with this is welcome

@@ -48,6 +50,39 @@ func FindUnmountableSecrets(g osgraph.Graph, f osgraph.Namer) []osgraph.Marker {
return markers
}

// FindMissingLivenessProbes inspects all PodSpecs for missing liveness probes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// and generates a list of deduped markers.... ?

@deads2k
Copy link
Contributor

deads2k commented Aug 16, 2016

This patch currently lists all pods without a --liveness probe, including replicationcontroller pods created by deploymentconfigs. It would be ideal to only list deploymentconfig pods in those cases.

Not just non-ideal. Any advice you give for fixing up an RC that's managed by a DC is bad advise. users shouldn't modify those resources.

@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch 3 times, most recently from d5417ff to d24e4a0 Compare October 12, 2016 17:57
@juanvallejo
Copy link
Contributor Author

@fabianofranz @deads2k Went ahead updated this PR to ignore rc's owned by deployment configs. Also added tests. PTAL

@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch from d24e4a0 to 612b301 Compare October 12, 2016 19:29
@fabianofranz
Copy link
Member

LGTM, @deads2k anything else?

func FindMissingLivenessProbes(g osgraph.Graph, f osgraph.Namer, setProbeCommand string) []osgraph.Marker {
markers := []osgraph.Marker{}
appendedNodes := sets.NewString()
ignoredNodes := findDeploymentEdgeKinds(g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fragile.

@deads2k
Copy link
Contributor

deads2k commented Oct 14, 2016

LGTM, @deads2k anything else?

The graph bits look a little fragile, but if you're committed to fixing them up later I'll let it go.

@juanvallejo
Copy link
Contributor Author

@deads2k

The graph bits look a little fragile, but if you're committed to fixing them up later I'll let it go.

Definitely. Do you have any suggestions for improving it in this PR?

@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch from 612b301 to d75004e Compare October 17, 2016 20:38
@juanvallejo
Copy link
Contributor Author

@fabianofranz @deads2k Added a ControllerRefEdgeKind and appended it to the edge between deploymentconfigs and replicationcontrollers for now, any thoughts on this implementation? d75004e

@@ -73,6 +75,7 @@ func AddDeploymentEdges(g osgraph.MutableUniqueGraph, node *deploygraph.Deployme
}
if BelongsToDeploymentConfig(node.DeploymentConfig, rcNode.ReplicationController) {
g.AddEdge(node, rcNode, DeploymentEdgeKind)
g.AddEdge(node, rcNode, ControllerRefEdgeKind)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments I had for you in person remain -- enforcing this weakly by co-locating the two edge creations instead of programmatically by subtyping ControllerRefEdgeKind with DeploymentEdgeKind is not ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevekuznetsov I will go ahead and open those changes in a new PR, unless it's better to add it in a new commit here? Thanks for the feedback on this btw

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that closing out that issue in this fast/incomplete way is the correct first step. I would implement the structure necessary to do it first, then come back to this PR. Will defer my opinion to @deads / @fabianofranz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly different deads summoned :]
cc @deads2k

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevekuznetsov

I am not sure that closing out that issue in this fast/incomplete way is the correct first step. I would implement the structure necessary to do it first, then come back to this PR

Sounds good, I'll go ahead and propose this change and begin working on a separate PR for it

@@ -22,6 +22,8 @@ const (
DeploymentEdgeKind = "Deployment"
// VolumeClaimEdgeKind goes from DeploymentConfigs to PersistentVolumeClaims indicating a request for persistent storage.
VolumeClaimEdgeKind = "VolumeClaim"
// ControllerRefEdgeKind goes from a controller node to its controlled child-node
ControllerRefEdgeKind = "ControllerRef"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the same as the "ManagedByController" edge from AddManagedByControllerPodEdges? See if you can find the PR, but I'm pretty sure we discussed this as generic there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Found this PR #9972. Since this edge kind exists, would I just be filtering the graph by ManagedByControllerEdgeKind here?

@juanvallejo
Copy link
Contributor Author

juanvallejo commented Oct 18, 2016

@deads2k or @csrwng PTAL 6807a7a

@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch from 298b191 to 6807a7a Compare October 18, 2016 22:06

// CheckForLivenessProbes iterates through all of the containers in a podSpecNode until it finds one
// with a liveness probe set. The list of nodes whose containers have no liveness probe set is returned.
func CheckForLivenessProbes(podSpecNode *kubegraph.PodSpecNode) []*kubegraph.PodSpecNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a bool check.


for _, uncastPodSpecNode := range g.NodesByKind(kubegraph.PodSpecNodeKind) {
podSpecNode := uncastPodSpecNode.(*kubegraph.PodSpecNode)
podsWithoutLivenessProbes := CheckForLivenessProbes(podSpecNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad name, this is a bool. Skip from here if there's no issue.

topLevelString := f.ResourceName(topLevelNode)

// prevent duplicate markers
if appendedNodes.Has(topLevelString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

for _, uncastPodSpecNode := range g.NodesByKind(kubegraph.PodSpecNodeKind) {
podSpecNode := uncastPodSpecNode.(*kubegraph.PodSpecNode)
podsWithoutLivenessProbes := CheckForLivenessProbes(podSpecNode)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check here to see if you

topLevelString := f.ResourceName(topLevelNode)

// prevent duplicate markers
if appendedNodes.Has(topLevelString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check here to see if you have a controllerref edge (whatever its called). continue if you do.

func FindMissingLivenessProbes(g osgraph.Graph, f osgraph.Namer, setProbeCommand string) []osgraph.Marker {
markers := []osgraph.Marker{}
appendedNodes := sets.NewString()
ignoredNodes := findNodesManagedByController(g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed.

// FindMissingLivenessProbes inspects all PodSpecs for missing liveness probes and generates a list of non-duplicate markers
func FindMissingLivenessProbes(g osgraph.Graph, f osgraph.Namer, setProbeCommand string) []osgraph.Marker {
markers := []osgraph.Marker{}
appendedNodes := sets.NewString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary.

}

appendedNodes.Insert(topLevelString)
for _, node := range podsWithoutLivenessProbes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name node sucks so much.


appendedNodes.Insert(topLevelString)
for _, node := range podsWithoutLivenessProbes {
markers = append(markers, osgraph.Marker{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, you either attach the marker to the toplevelcontainingnode or to the podspec node (see which renders right). If they both render correctly, attach to the podspecnode.

appendedNodes := sets.NewString()
ignoredNodes := findNodesManagedByController(g)

for _, uncastPodSpecNode := range g.NodesByKind(kubegraph.PodSpecNodeKind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in psuedo code:

if hasliveness(podspec){
    continue
}

topContainingNode = ....
if topcontainerNode has controllerref edge{
    continue
}

addmarker

@juanvallejo
Copy link
Contributor Author

@deads2k Thanks for the feedback! Review comments addressed 9ed2553

cc @fabianofranz

@juanvallejo
Copy link
Contributor Author

networking check flaked on #11452 re[test]

@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch from 9ed2553 to 92c5fb5 Compare October 25, 2016 20:40
podSpecNode := uncastPodSpecNode.(*kubegraph.PodSpecNode)
topLevelNode := osgraph.GetTopLevelContainerNode(g, podSpecNode)

if hasLivenessProbe(podSpecNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is cheap, run it first.


for _, uncastPodSpecNode := range g.NodesByKind(kubegraph.PodSpecNodeKind) {
podSpecNode := uncastPodSpecNode.(*kubegraph.PodSpecNode)
topLevelNode := osgraph.GetTopLevelContainerNode(g, podSpecNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move after the "is this node ok" check

if hasLivenessProbe(podSpecNode) {
continue
}
if hasControllerRefEdge(g, topLevelNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here explaining the reasoning.


topLevelString := f.ResourceName(topLevelNode)
markers = append(markers, osgraph.Marker{
Node: podSpecNode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the toplevel node as related.

}

// hasControllerRefEdge returns true if a given node contains a "ManagedByController" edge between itself and any of its "To" nodes.
func hasControllerRefEdge(g osgraph.Graph, node graph.Node) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a cleaner way to do this? List outbound edges and check them directly?

@deads2k
Copy link
Contributor

deads2k commented Oct 26, 2016

minor comments. lgtm otherwise.

@juanvallejo
Copy link
Contributor Author

@deads2k Thanks for the feedback, review comments addressed

@deads2k
Copy link
Contributor

deads2k commented Oct 26, 2016

lgtm. needs a squash.

This patch iterates through pod specs, listing specs with no liveness
probes set. Ignores podspec nodes with a controlling edge kind.
@juanvallejo juanvallejo force-pushed the jvallejo_emit-warning-no-liveness-probe branch from d43f32f to 914b5c9 Compare October 26, 2016 16:54
@juanvallejo
Copy link
Contributor Author

conformance test flaked on #11114 re[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 914b5c9

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10701/) (Base Commit: 8bb33be)

@juanvallejo
Copy link
Contributor Author

@deads2k

lgtm. needs a squash.

Done!

@fabianofranz
Copy link
Member

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 914b5c9

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 27, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10767/) (Base Commit: 0403599) (Image: devenv-rhel7_5261)

@openshift-bot openshift-bot merged commit 5494a7e into openshift:master Oct 28, 2016
@juanvallejo juanvallejo deleted the jvallejo_emit-warning-no-liveness-probe branch October 28, 2016 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

oc status should emit warning if no liveness probe is defined for a pod
5 participants