-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync NFD Helm Charts and config with the GPU-Operator #557
Conversation
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArangoGutierrez !!
Regarding the values,yaml, it is generated by make release-build.
So the changes should be done in the template here:
https://github.com/Mellanox/network-operator/blob/master/hack/templates/values/values.template#L19
/retest-nic_operator_kind |
1 similar comment
/retest-nic_operator_kind |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
/retest-nic_operator_helm |
/retest-nic_operator_kind |
@adrianchiris @rollandf PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
can be merged once CI is green |
/retest-nic_operator_helm |
4 similar comments
/retest-nic_operator_helm |
/retest-nic_operator_helm |
/retest-nic_operator_helm |
/retest-nic_operator_helm |
looking at the CI logs (ATM they are not available for query from github due to internal issue) i see that when deploying network-operator via helm nicclusterpolicy CR fails to become ready. it deploys it with a mock container of MOFED to avoid log compile time as well as with rdma-shared-device-plugin. the latter may happen if NFD did not label nodes with: feature.node.kubernetes.io/pci-15b3.present: "true"
|
comparing a successfull run of another pr i can see that when getting node information |
@@ -18,6 +18,7 @@ | |||
|
|||
nfd: | |||
enabled: true | |||
nodeFeatureRules: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: where is this being used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GFD (from the GPU feature discovery) will start using it soon, for this release we have it optional, but for the next one we plan to change it to enabled by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section is not propagated to node-feature-discovery chart. and currently we dont deploy anything related to GFD via helm.
should this be under node-feature-discovery ? (although i didnt find this parameter in node-feature-discovery chart valuses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have enableNodeFeatureApi
under node-feature-discovery section so i guess the parameter here is not needed ?
i have taken a look at the CI issue. if we enable node feature rule feature in NFD need to enable node feature controller or crd-controller as its the entity which labels the node. currently both are defaulted to null AND since we use the instance namespace it will not enable it by default. also i saw there is a typo in the chart WRT crd-controller. see: https://github.com/kubernetes-sigs/node-feature-discovery/blob/10bbc8f253a3ebf4cb77bb5e7c2c910040065cf3/deployment/helm/node-feature-discovery/templates/master.yaml#L105 ill submit a PR to fix it. |
CI is green now! thanks for your help understanding the logs @adrianchiris |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
/retest-nic_operator_kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! lets wait for CI to pass and can be merged.
All green!!! |
This Patch:
Values.yaml
with those in the GPU-Operator