Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STOR-1167: Rebase to v1.18.0 for OCP 4.14 #222

Merged
merged 89 commits into from
Jun 19, 2023

Conversation

jsafrane
Copy link

@jsafrane jsafrane commented Apr 26, 2023

ConnorJC3 and others added 30 commits January 13, 2023 17:48
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Eddie Torres <torredil@amazon.com>
Migrate Trivy workflow to grab images from values.yaml
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Eddie Torres <torredil@amazon.com>
Signed-off-by: Eddie Torres <torredil@amazon.com>
Use test driver image when testing upgrades with CT
Introduce logging-format driver option for the controller and node pods to set the log format. Permitted formats: text (default), json.

Migrate to Structured Logging.

Deprecate logtostderr flag.

Signed-off-by: Eddie Torres <torredil@amazon.com>
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Bump CI k8s version to 1.26.1 (and other CI tools upgrades)
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Connor Catlett <conncatl@amazon.com>
Signed-off-by: Connor Catlett <conncatl@amazon.com>
…n type

Signed-off-by: Connor Catlett <conncatl@amazon.com>
…tive-execution

Update speculative execution of docker buildx to check buildkit daemon type
This is intended to fix the following error:

```
$ go list -mod readonly -m all
go: k8s.io/dynamic-resource-allocation@v0.0.0: invalid version: unknown revision v0.0.0
```
…esource-allocation-version

Pin k8s.io/dynamic-resource-allocation to v0.26.0
Signed-off-by: Eddie Torres <torredil@amazon.com>
* endpoints.Ec2ServiceID is deprecated: Use client package's EndpointsID value instead of ServiceIDs.
* rand.Seed is deprecated; use rand.New(NewSource(seed)).

Signed-off-by: Eddie Torres <torredil@amazon.com>
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 26, 2023

@jsafrane: This pull request references STOR-1167 which is a valid jira issue.

In response to this:

Diff to upstream v1.18.0:
kubernetes-sigs/aws-ebs-csi-driver@v1.18.0...jsafrane:rebase-v1.18.0

Notable changes since v1.15.0 (OCP 4.13):

Full changelog: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/CHANGELOG.md

@openshift/storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jsafrane jsafrane changed the title STOR-1167: OCPBUGS-12297: Rebase to v1.18.0 for OCP 4.13 STOR-1167: Rebase to v1.18.0 for OCP 4.13 Apr 26, 2023
@jsafrane
Copy link
Author

This PR requires go 1.20, #223
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 26, 2023
jsafrane and others added 2 commits April 26, 2023 13:45
- Remove .github files: we don't want custom templates or dependabots in
  OpenShift forks.

- Compile with -mod=vendor

- Compile without KUBECONFIG: hack/verify-kustomize reads the current
  namespace and adds it into generated / verified manifests.
With KUBECONFIG="", it won't be able to get the namespace and thus it will
generate manifests with namespace: default.
@jsafrane
Copy link
Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 26, 2023
@jsafrane jsafrane changed the title STOR-1167: Rebase to v1.18.0 for OCP 4.13 STOR-1167: Rebase to v1.18.0 for OCP 4.14 May 3, 2023
@Phaow
Copy link

Phaow commented May 8, 2023

/test e2e-aws-csi-extended

@dobsonj
Copy link
Member

dobsonj commented May 11, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 11, 2023
@jsafrane
Copy link
Author

/label px-approved
/label docs-approved

@openshift-ci openshift-ci bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR labels May 15, 2023
@Phaow
Copy link

Phaow commented May 18, 2023

QE start pre merge test using the pre merged build 4.13.0-0.ci.test-2023-05-17-023715-ci-ln-10rnqtb-latest

  1. Check the rebase csi driver version build is as expected
sh-4.4$ /usr/bin/aws-ebs-csi-driver --version
{
  "driverVersion": "v1.18.0",
  "gitCommit": "2a6426af3d059f17abacb0a11978431d65180985",
  "buildDate": "2023-05-17T02:35:42+00:00",
  "goVersion": "go1.20.3",
  "compiler": "gc",
  "platform": "linux/amd64"
}
  1. Pre Submit job ci/prow/e2e-aws-csi-extended run all QE auto test cases all passed
  2. Notable changes test
  • Add support for Fast Snapshot Restore Failed
  • Support for interpolated tags in VolumeSnapshotClass Failed
  • Add support for XFS custom block sizes passed

Fast Snapshot Restore issues

  • We missed aws policy config for this feature, it also needs
"ec2:DescribeAvailabilityZones",
"ec2:EnableFastSnapshotRestores"

we need to add these 2 permission to our credentials_request configuration.

  • There seems some flaky for this feature that the snapshot failed to create very frequency caused by error creating snapshot of volume vol-0e2bec4afb042dd7e: SnapshotCreationPerVolumeRateExceeded: The maximum per volume CreateSnapshot request rate has been exceeded. Use an increasing or variable sleep interval between requests. Looks like we need to an increasing or variable sleep interval between snapshot for the same volume, I haven't found the root cause yet.
$ oc describe volumesnapshot mypvc-snapshot
Name:         mypvc-snapshot
Namespace:    my-storage-ccc
Labels:       <none>
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1
Kind:         VolumeSnapshot
Metadata:
  Creation Timestamp:  2023-05-18T07:12:48Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
    snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
  Generation:  1
  Managed Fields:
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:source:
          .:
          f:persistentVolumeClaimName:
        f:volumeSnapshotClassName:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-05-18T07:12:48Z
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection":
          v:"snapshot.storage.kubernetes.io/volumesnapshot-bound-protection":
    Manager:      snapshot-controller
    Operation:    Update
    Time:         2023-05-18T07:12:57Z
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:boundVolumeSnapshotContentName:
        f:error:
          .:
          f:message:
          f:time:
        f:readyToUse:
    Manager:         snapshot-controller
    Operation:       Update
    Subresource:     status
    Time:            2023-05-18T09:07:30Z
  Resource Version:  231703
  UID:               d66a51e8-af1e-49d7-af8f-e279820933a0
Spec:
  Source:
    Persistent Volume Claim Name:  mypvc
  Volume Snapshot Class Name:      csi-aws-vsc-fast
Status:
  Bound Volume Snapshot Content Name:  snapcontent-d66a51e8-af1e-49d7-af8f-e279820933a0
  Error:
    Message:     Failed to check and update snapshot content: failed to take snapshot of the volume vol-0f48f717d5ebc2adb: "rpc error: code = Internal desc = Could not create snapshot \"snapshot-d66a51e8-af1e-49d7-af8f-e279820933a0\": error creating snapshot of volume vol-0f48f717d5ebc2adb: ConcurrentSnapshotLimitExceeded: Maximum allowed in-progress snapshots for a single volume exceeded.\n\tstatus code: 400, request id: ff751f21-299d-4f84-8009-1bcad48130ca"
    Time:        2023-05-18T09:07:30Z
  Ready To Use:  false
Events:
  Type     Reason                         Age   From                 Message
  ----     ------                         ----  ----                 -------
  Warning  SnapshotContentCreationFailed  114m  snapshot-controller  Failed to create snapshot content with error failed to get input parameters to create snapshot mypvc-snapshot: "the PVC mypvc is not yet bound to a PV, will not attempt to take a snapshot"
  Normal   CreatingSnapshot               114m  snapshot-controller  Waiting for a snapshot my-storage-ccc/mypvc-snapshot to be created by the CSI driver.
$ oc get volumesnapshotcontent snapcontent-d66a51e8-af1e-49d7-af8f-e279820933a0
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER            VOLUMESNAPSHOTCLASS   VOLUMESNAPSHOT   VOLUMESNAPSHOTNAMESPACE   AGE
snapcontent-d66a51e8-af1e-49d7-af8f-e279820933a0   false                      Delete           ebs.csi.aws.com   csi-aws-vsc-fast      mypvc-snapshot   my-storage-ccc            90m
$ oc logs aws-ebs-csi-driver-controller-7884cd4d48-55qxd -c csi-driver| grep -i 'Could not create snapshot "snapshot-d66a51e8-af1e-49d7-af8f-e279820933a0"'|wc -l
    5225

Interpolated tags in VolumeSnapshotClass issues

  • This feature we need to setting --extra-create-metadata for our csi-snapshotter sidecar. It enables external-snapshotter to pass the snapshot parameters to CSI Driver.

@jsafrane
Copy link
Author

Adding FastSnapshot permissions in openshift/cluster-storage-operator#369

@jsafrane
Copy link
Author

Enabling snapshotter's --extra-create-metadata: openshift/aws-ebs-csi-driver-operator#223

@Phaow
Copy link

Phaow commented May 23, 2023

@jsafrane Thanks for the quick fix! The Fast Snapshot Restore issues second issue I opened kubernetes-sigs#1608 and confrimed that there's a default quota limit of Fast Snapshot Restore(only 5 fast snapshots quota in each region by default) and a bug kubernetes-csi/external-snapshotter#778 in external-snapshotter. Fix patch is in progress. For the rebase, all looks good from QE side.

@Phaow
Copy link

Phaow commented May 23, 2023

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 23, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jun 19, 2023

@jsafrane: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit fdd8ff8 into openshift:master Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.