Skip to content

Commit

Permalink
Rename 'devicePlugin' to 'rdmaSharedDevicePlugin'
Browse files Browse the repository at this point in the history
Since we will allow configure and install SR-IOV Network Device Plugin
we need to have more user-friendly name to not confuse users with a
'devicePlugin' option.

depends-on: kubernetes-ci#93

Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
  • Loading branch information
e0ne committed Mar 11, 2021
1 parent b2defee commit f370a2b
Show file tree
Hide file tree
Showing 12 changed files with 72 additions and 68 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ CRD that defines a Cluster state for Mellanox Network devices.
#### NICClusterPolicy spec:
NICClusterPolicy CRD Spec includes the following sub-states/stages:
- `ofedDriver`: [OFED driver container](https://github.com/Mellanox/ofed-docker) to be deployed on Mellanox supporting nodes.
- `devicePlugin`: [RDMA shared device plugin](https://github.com/Mellanox/k8s-rdma-shared-dev-plugin)
- `rdmaSharedDevicePlugin`: [RDMA shared device plugin](https://github.com/Mellanox/k8s-rdma-shared-dev-plugin)
and related configurations.
- `nvPeerDriver`: [Nvidia Peer Memory client driver container](https://github.com/Mellanox/ofed-docker)
to be deployed on RDMA & GPU supporting nodes (required for GPUDirect workloads).
Expand All @@ -95,7 +95,7 @@ spec:
image: mofed
repository: mellanox
version: 5.2-1.0.4.0
devicePlugin:
rdmaSharedDevicePlugin:
image: k8s-rdma-shared-dev-plugin
repository: mellanox
version: v1.1.0
Expand Down
8 changes: 4 additions & 4 deletions api/v1alpha1/nicclusterpolicy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ type NicClusterPolicySpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file

OFEDDriver *OFEDDriverSpec `json:"ofedDriver,omitempty"`
NVPeerDriver *NVPeerDriverSpec `json:"nvPeerDriver,omitempty"`
DevicePlugin *DevicePluginSpec `json:"devicePlugin,omitempty"`
SecondaryNetwork *SecondaryNetworkSpec `json:"secondaryNetwork,omitempty"`
OFEDDriver *OFEDDriverSpec `json:"ofedDriver,omitempty"`
NVPeerDriver *NVPeerDriverSpec `json:"nvPeerDriver,omitempty"`
RdmaSharedDevicePlugin *DevicePluginSpec `json:"rdmaSharedDevicePlugin,omitempty"`
SecondaryNetwork *SecondaryNetworkSpec `json:"secondaryNetwork,omitempty"`
}

// AppliedState defines a finer-grained view of the observed state of NicClusterPolicy
Expand Down
4 changes: 2 additions & 2 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 15 additions & 15 deletions config/crd/bases/mellanox.com_nicclusterpolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ spec:
spec:
description: NicClusterPolicySpec defines the desired state of NicClusterPolicy
properties:
devicePlugin:
description: DevicePluginSpec describes configuration options for
device plugin
nvPeerDriver:
description: NVPeerDriverSpec describes configuration options for
NV Peer Memory driver
properties:
config:
description: Device plugin configuration
gpuDriverSourcePath:
description: GPU driver sources path - Optional
type: string
image:
pattern: '[a-zA-Z0-9\-]+'
Expand All @@ -57,18 +57,14 @@ spec:
pattern: '[a-zA-Z0-9\.-]+'
type: string
required:
- config
- image
- repository
- version
type: object
nvPeerDriver:
description: NVPeerDriverSpec describes configuration options for
NV Peer Memory driver
ofedDriver:
description: OFEDDriverSpec describes configuration options for OFED
driver
properties:
gpuDriverSourcePath:
description: GPU driver sources path - Optional
type: string
image:
pattern: '[a-zA-Z0-9\-]+'
type: string
Expand All @@ -83,10 +79,13 @@ spec:
- repository
- version
type: object
ofedDriver:
description: OFEDDriverSpec describes configuration options for OFED
driver
rdmaSharedDevicePlugin:
description: DevicePluginSpec describes configuration options for
device plugin
properties:
config:
description: Device plugin configuration
type: string
image:
pattern: '[a-zA-Z0-9\-]+'
type: string
Expand All @@ -97,6 +96,7 @@ spec:
pattern: '[a-zA-Z0-9\.-]+'
type: string
required:
- config
- image
- repository
- version
Expand Down
22 changes: 11 additions & 11 deletions deployment/network-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,9 @@ chart parameters to deploy it together with the operator.

## Helm Tests

Network Operator has Helm tests to verify deployment. To run tests it is required to set the following chart parameters on helm install/upgrade: `deployCR`, `devicePlugin`, `secondaryNetwork` as the test depends on `NicClusterPolicy` instance being deployed by Helm.
Network Operator has Helm tests to verify deployment. To run tests it is required to set the following chart parameters on helm install/upgrade: `deployCR`, `rdmaSharedDevicePlugin`, `secondaryNetwork` as the test depends on `NicClusterPolicy` instance being deployed by Helm.
Supported Tests:
- Device Plugin Resource: This test creates a pod that requests the first resource in `devicePlugin.resources`
- Device Plugin Resource: This test creates a pod that requests the first resource in `rdmaSharedDevicePlugin.resources`
- RDMA Traffic: This test creates a pod that test loopback RDMA traffic with `rping`

Run the helm test with following command after deploying network operator with helm
Expand Down Expand Up @@ -168,11 +168,11 @@ Production cluster environment can deny direct access to the Internet and instea

| Name | Type | Default | description |
| ---- | ---- | ------- | ----------- |
| `devicePlugin.deploy` | bool | `true` | Deploy device plugin |
| `devicePlugin.repository` | string | `mellanox` | Device plugin image repository |
| `devicePlugin.image` | string | `k8s-rdma-shared-dev-plugin` | Device plugin image name |
| `devicePlugin.version` | string | `v1.1.0` | Device plugin version |
| `devicePlugin.resources` | list | See below | Device plugin resources |
| `rdmaSharedDevicePlugin.deploy` | bool | `true` | Deploy device plugin |
| `rdmaSharedDevicePlugin.repository` | string | `mellanox` | Device plugin image repository |
| `rdmaSharedDevicePlugin.image` | string | `k8s-rdma-shared-dev-plugin` | Device plugin image name |
| `rdmaSharedDevicePlugin.version` | string | `v1.1.0` | Device plugin version |
| `rdmaSharedDevicePlugin.resources` | list | See below | Device plugin resources |

##### RDMA Device Plugin Resource configurations

Expand Down Expand Up @@ -254,7 +254,7 @@ deployCR: true
ofedDriver:
deploy: true
version: 5.2-1.0.4.0
devicePlugin:
rdmaSharedDevicePlugin:
deploy: true
resources:
- name: rdma_shared_device_a
Expand All @@ -272,7 +272,7 @@ ofedDriver:
deploy: true
nvPeerDriver:
deploy: true
devicePlugin:
rdmaSharedDevicePlugin:
deploy: true
resources:
- name: rdma_shared_device_a
Expand All @@ -292,7 +292,7 @@ Network Operator deployment with:
__values.yaml:__
```:yaml
deployCR: true
devicePlugin:
rdmaSharedDevicePlugin:
deploy: true
resources:
- name: rdma_shared_device_a
Expand All @@ -314,7 +314,7 @@ mapped to Mellanox ConnectX-5.
__values.yaml:__
```:yaml
deployCR: true
devicePlugin:
rdmaSharedDevicePlugin:
deploy: true
resources:
- name: rdma_shared_device_a
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ spec:
singular: nicclusterpolicy
scope: Cluster
versions:
- name: v1alpha1
- additionalPrinterColumns:
- jsonPath: .status.state
name: Status
type: string
name: v1alpha1
schema:
openAPIV3Schema:
description: NicClusterPolicy is the Schema for the nicclusterpolicies API
Expand All @@ -36,12 +40,12 @@ spec:
spec:
description: NicClusterPolicySpec defines the desired state of NicClusterPolicy
properties:
devicePlugin:
description: DevicePluginSpec describes configuration options for
device plugin
nvPeerDriver:
description: NVPeerDriverSpec describes configuration options for
NV Peer Memory driver
properties:
config:
description: Device plugin configuration
gpuDriverSourcePath:
description: GPU driver sources path - Optional
type: string
image:
pattern: '[a-zA-Z0-9\-]+'
Expand All @@ -53,18 +57,14 @@ spec:
pattern: '[a-zA-Z0-9\.-]+'
type: string
required:
- config
- image
- repository
- version
type: object
nvPeerDriver:
description: NVPeerDriverSpec describes configuration options for
NV Peer Memory driver
ofedDriver:
description: OFEDDriverSpec describes configuration options for OFED
driver
properties:
gpuDriverSourcePath:
description: GPU driver sources path - Optional
type: string
image:
pattern: '[a-zA-Z0-9\-]+'
type: string
Expand All @@ -79,10 +79,13 @@ spec:
- repository
- version
type: object
ofedDriver:
description: OFEDDriverSpec describes configuration options for OFED
driver
rdmaSharedDevicePlugin:
description: DevicePluginSpec describes configuration options for
device plugin
properties:
config:
description: Device plugin configuration
type: string
image:
pattern: '[a-zA-Z0-9\-]+'
type: string
Expand All @@ -93,6 +96,7 @@ spec:
pattern: '[a-zA-Z0-9\.-]+'
type: string
required:
- config
- image
- repository
- version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,19 @@ spec:
version: {{ .Values.nvPeerDriver.version }}
gpuDriverSourcePath: {{ .Values.nvPeerDriver.gpuDriverSourcePath }}
{{- end }}
{{- if .Values.devicePlugin.deploy }}
devicePlugin:
# {{ required "A valid value for .Values.devicePlugin.resources is required" .Values.devicePlugin.resources }}
image: {{ .Values.devicePlugin.image }}
repository: {{ .Values.devicePlugin.repository }}
version: {{ .Values.devicePlugin.version }}
{{- if .Values.rdmaSharedDevicePlugin.deploy }}
rdmaSharedDevicePlugin:
# {{ required "A valid value for .Values.rdmaSharedDevicePlugin.resources is required" .Values.rdmaSharedDevicePlugin.resources }}
image: {{ .Values.rdmaSharedDevicePlugin.image }}
repository: {{ .Values.rdmaSharedDevicePlugin.repository }}
version: {{ .Values.rdmaSharedDevicePlugin.version }}
# The config below directly propagates to k8s-rdma-shared-device-plugin configuration.
# Replace 'devices' with your (RDMA capable) netdevice name.
config: |
{
"configList": [
{{- $length := len .Values.devicePlugin.resources }}
{{- range $index, $element := .Values.devicePlugin.resources }}
{{- $length := len .Values.rdmaSharedDevicePlugin.resources }}
{{- range $index, $element := .Values.rdmaSharedDevicePlugin.resources }}
{
"resourceName": {{ $element.name | quote }},
"rdmaHcaMax": 1000,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if and .Values.deployCR .Values.devicePlugin.deploy }}
{{- if and .Values.deployCR .Values.rdmaSharedDevicePlugin.deploy }}
apiVersion: v1
kind: Pod
metadata:
Expand All @@ -16,9 +16,9 @@ spec:
add: [ "IPC_LOCK" ]
resources:
requests:
rdma/{{ (index .Values.devicePlugin.resources 0).name }}: '1'
rdma/{{ (index .Values.rdmaSharedDevicePlugin.resources 0).name }}: '1'
limits:
rdma/{{ (index .Values.devicePlugin.resources 0).name }}: '1'
rdma/{{ (index .Values.rdmaSharedDevicePlugin.resources 0).name }}: '1'
command:
- sh
- -c
Expand Down
6 changes: 3 additions & 3 deletions deployment/network-operator/templates/tests/test-rping.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if and .Values.deployCR .Values.devicePlugin.deploy .Values.secondaryNetwork.deploy .Values.secondaryNetwork.multus.deploy .Values.secondaryNetwork.cniPlugins.deploy }}
{{- if and .Values.deployCR .Values.rdmaSharedDevicePlugin .deploy .Values.secondaryNetwork.deploy .Values.secondaryNetwork.multus.deploy .Values.secondaryNetwork.cniPlugins.deploy }}
apiVersion: mellanox.com/v1alpha1
kind: MacvlanNetwork
metadata:
Expand Down Expand Up @@ -35,9 +35,9 @@ spec:
add: [ "IPC_LOCK" ]
resources:
requests:
rdma/{{ (index .Values.devicePlugin.resources 0).name }}: '1'
rdma/{{ (index .Values.rdmaSharedDevicePlugin.resources 0).name }}: '1'
limits:
rdma/{{ (index .Values.devicePlugin.resources 0).name }}: '1'
rdma/{{ (index .Values.rdmaSharedDevicePlugin.resources 0).name }}: '1'
command:
- sh
- -c
Expand Down
2 changes: 1 addition & 1 deletion deployment/network-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ nvPeerDriver:
version: 1.0-9
gpuDriverSourcePath: /run/nvidia/driver

devicePlugin:
rdmaSharedDevicePlugin:
deploy: true
image: k8s-rdma-shared-dev-plugin
repository: mellanox
Expand Down
2 changes: 1 addition & 1 deletion example/crs/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ spec:
image: mofed
repository: mellanox
version: 5.2-1.0.4.0
devicePlugin:
rdmaSharedDevicePlugin:
image: k8s-rdma-shared-dev-plugin
repository: mellanox
version: v1.1.0
Expand Down
4 changes: 2 additions & 2 deletions pkg/state/state_shared_dp.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ func (s *stateSharedDp) Sync(customResource interface{}, infoCatalog InfoCatalog
log.V(consts.LogLevelInfo).Info(
"Sync Custom resource", "State:", s.name, "Name:", cr.Name, "Namespace:", cr.Namespace)

if cr.Spec.DevicePlugin == nil {
if cr.Spec.RdmaSharedDevicePlugin == nil {
// Either this state was not required to run or an update occurred and we need to remove
// the resources that where created.
// TODO: Support the latter case
Expand Down Expand Up @@ -109,7 +109,7 @@ func (s *stateSharedDp) GetWatchSources() map[string]*source.Kind {
func (s *stateSharedDp) getManifestObjects(
cr *mellanoxv1alpha1.NicClusterPolicy) ([]*unstructured.Unstructured, error) {
renderData := &sharedDpManifestRenderData{
CrSpec: cr.Spec.DevicePlugin,
CrSpec: cr.Spec.RdmaSharedDevicePlugin,
RuntimeSpec: &runtimeSpec{
Namespace: consts.NetworkOperatorResourceNamespace,
},
Expand Down

0 comments on commit f370a2b

Please sign in to comment.