Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing Access Points #1026

Merged
merged 1 commit into from
Sep 19, 2023
Merged

Conversation

mskanth972
Copy link
Contributor

@mskanth972 mskanth972 commented Jun 6, 2023

Is this a bug fix or adding new feature?

When performing dynamic provisioning to mount Amazon Elastic File System (EFS) on an Elastic Kubernetes Service (EKS) cluster, an access point (folder) is automatically created within EFS. This access point is assigned a unique pathname based on the base path name /dynamic_provisioning/pvc-id, where the "pvc-id" represents a randomly generated Persistent Volume Claim (PVC) ID given by Kubernetes. The access point serves as the location where the actual data is stored.

However, a challenge arises when the EKS cluster experiences downtime, deleted accidentally or user wants to connect same AP for newly created EKS cluster. In such cases, when creating a new EKS cluster and performed dynamic provisioning a new access point is created with a different base pathname i.e., PVC ID. This poses an issue because the previous access point, which contains valuable data, becomes inaccessible from the pods as paths are mis-matching.

To overcome this challenge, a solution needs to be implemented to enable seamless data access and preservation across EKS cluster. The objective is to establish a mechanism that allows pods to access and manipulate data from the previous access point when user wants to.

To address the challenge of accessing previous access points in the EFS CSI driver during EKS cluster chnages, we propose the following approach.
First, we will leverage the client token feature provided by EFS, which serves as an idempotent identifier for access points. Additionally, we will introduce a new parameter in the storage class of the EFS CSI driver to enable this functionality.

To make the Access point re-usable user can set reuseAccessPoint=true parameter in Storageclass.yaml. When a new access point is created, we will assign the given PVC name as the client token. This establishes a connection between the PVC name and the access point, ensuring easy identification and matching. So, next time if the user wants to reconnect to existing AP from a different cluster, the same PVC name has to be mentioned.

Now, let's consider the scenario where the user creates a new cluster. In this case, when a user wants to access the same old access point, they can simply provide the original PVC name during the mounting process. As the parameter is set to true in StorageClass, the EFS CSI driver will check for client tokens associated with the specific Elastic file system. If a matching client token is found for the provided PVC name, indicating the existence of an access point associated with that PVC name, the driver will recognize it as the same old access point. Instead of creating a new access point, it will reuse the existing one. This seamless transition ensures that users can access their old access points and the data within them.

Note: Client Token in Access point has a limit upto 64 char lengthIssue can occur when there are two PVCs with 64 + character length where first 64 characters are the same, So implemented hashing function to hash the names to < 64 char hash.

What is this PR about? / Why do we need it?

What testing is done?

[mskanth@MSD]$ cat pod.yaml 
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name:clienttoketest7
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: efs-app
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: clienttoketest7

Sample storage class

[mskanth@MSD]$ cat storageclass.yaml 
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
mountOptions: 
  - iam
reclaimPolicy: Retain
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-12341234
  directoryPerms: "700"
  gidRangeStart: "1000" # optional
  gidRangeEnd: "2000" # optional
  basePath: "/dynamic_provisioning" # optional
  reuseAccessPoint: "true"

Logs

I0622 04:23:56.514543 1 controller.go:63] CreateVolume: called with args {Name:pvc-b3751fe7-6bb2-45de-a0fa-4b001c81d519 CapacityRange:required_bytes:5368709120 VolumeCapabilities:[mount:<mount_flags:"iam" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[basePath:/dynamic_provisioning csi.storage.k8s.io/pv/name:pvc-b3751fe7-6bb2-45de-a0fa-4b001c81d519 csi.storage.k8s.io/pvc/name:clienttokentest7 csi.storage.k8s.io/pvc/namespace:default directoryPerms:700 fileSystemId:fs-09309933ee18d2ada gidRangeEnd:2000 gidRangeStart:1000 provisioningMode:efs-ap reuseAccessPoint:true] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0622 04:23:56.514644 1 cloud.go:292] Calling DescribeFileSystems with input: {
 FileSystemId: "fs-12341234"
}
I0622 04:23:56.659117 1 gid_allocator.go:52] Recieved getNextGid for fsId: fs-12341234, min: 1000, max: 2000
I0622 04:23:56.659141 1 cloud.go:258] AccessPointOptions to find AP : &{CapacityGiB:5368709120 FileSystemId:fs-12341234 Uid:1002 Gid:1002 DirectoryPerms:700 DirectoryPath:/dynamic_provisioning/pvc-b3751fe7-6bb2-45de-a0fa-4b001c81d519 Tags:map[efs.csi.aws.com/cluster:true]}
I0622 04:23:56.659156 1 cloud.go:259] ClientToken to find AP : clienttokentest7
I0622 04:23:56.746413 1 cloud.go:277] ClientToken found : efs-claim1
I0622 04:23:56.746418 1 cloud.go:277] ClientToken found : efs-claim
I0622 04:23:56.746617 1 cloud.go:277] ClientToken found : pvc-6655d232-ec2c-4573-9b44-d88d3676de16
I0622 04:23:56.746620 1 cloud.go:277] ClientToken found : pvc-72800a83-8944-4af8-bb2e-ec941722fa97
I0622 04:23:56.746622 1 cloud.go:277] ClientToken found : efs-apresue1
I0622 04:23:56.746625 1 cloud.go:277] ClientToken found : efs-apreuse2
I0622 04:23:56.746657 1 cloud.go:277] ClientToken found : clienttokentest6
I0622 04:23:56.746660 1 cloud.go:277] ClientToken found : clienttokentest7
I0622 04:23:56.746666 1 cloud.go:171] Existing AccessPoint found : &{AccessPointId:fsap-06209626d3d178877 FileSystemId:fs-09309933ee18d2ada AccessPointRootDir:/dynamic_provisioning/pvc-373df757-4a50-48ed-a584-c6fcdb8fbde1 CapacityGiB:0}

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 6, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mskanth972

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 14, 2023
@mskanth972 mskanth972 changed the title APreuse Reusing Access Points Jun 19, 2023
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
@mskanth972 mskanth972 force-pushed the APreuse branch 2 times, most recently from d053824 to effb068 Compare June 21, 2023 05:41
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 21, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 22, 2023
@mskanth972 mskanth972 force-pushed the APreuse branch 3 times, most recently from fdcc1d3 to f9993f6 Compare June 22, 2023 04:39
@mskanth972
Copy link
Contributor Author

/test pull-aws-efs-csi-driver-e2e

@mskanth972
Copy link
Contributor Author

/test pull-aws-efs-csi-driver-unit

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2023
docs/README.md Outdated
@@ -39,6 +39,8 @@ The following CSI interfaces are implemented:
| subPathPattern | | `/${.PV.name}` | true | The template used to construct the subPath under which each of the access points created under Dynamic Provisioning. Can be made up of fixed strings and limited variables, is akin to the 'subPathPattern' variable on the [nfs-subdir-external-provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner) chart. Supports `.PVC.name`,`.PVC.namespace` and `.PV.name` |
| ensureUniqueDirectory | | true | true | **NOTE: Only set this to false if you're sure this is the behaviour you want**.<br/> Used when dynamic provisioning is enabled, if set to true, appends the a UID to the pattern specified in `subPathPattern` to ensure that access points will not accidentally point at the same directory. |
| az | | "" | true | Used for cross-account mount. `az` under storage class parameter is optional. If specified, mount target associated with the az will be used for cross-account mount. If not specified, a random mount target will be picked for cross account mount |
| ensureUniqueDirectory | | true | true |
Copy link
Contributor

@Ashley-wenyizha Ashley-wenyizha Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove the explanation part of this param?

| ensureUniqueDirectory | | true | true | **NOTE: Only set this to false if you're sure this is the behaviour you want**.<br/> Used when dynamic provisioning is enabled, if set to true, appends the a UID to the pattern specified in `subPathPattern` to ensure that access points will not accidentally point at the same directory. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to rebase this PR. I will do that and will push the changes

@Ashley-wenyizha
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2023
@k8s-ci-robot k8s-ci-robot merged commit 44c9b85 into kubernetes-sigs:master Sep 19, 2023
8 checks passed
@andrewhharmon
Copy link

Im struggling to implement this. i want to access the same EFS data across 2 clusters. The description above talks about using usePvcName=true but the example storage class uses reuseAccessPoint: "true". Also not sure if i need to set ensureUniqueDirectory to false. Is there an example somewhere that shows how to do this?

@mskanth972
Copy link
Contributor Author

Im struggling to implement this. i want to access the same EFS data across 2 clusters. The description above talks about using usePvcName=true but the example storage class uses reuseAccessPoint: "true". Also not sure if i need to set ensureUniqueDirectory to false. Is there an example somewhere that shows how to do this?

Sorry for the above example. We implemented that with a different parameter while testing and we changed to a new name that is reuseAccessPoint. Its totally upto the user in setting EnsureUniquedirectory parameter as it works either way. And will change the example mentioned above

@andrewhharmon
Copy link

OK, so before i used this setting, I saw the Client token in efs with a value of like pvc-xxx-xxx--xxx now i see a random string. I was thinking it would be the name of the PVC. is that not correct? I also don't see logging like i see in yours that shows it trying to look up the token. I'm setting the pvc name like so

kind: PersistentVolumeClaim
metadata:
  name: efs-claim

@andrewhharmon
Copy link

andrewhharmon commented Oct 4, 2023

sorry, i am seeing some logging similar to above.

efs-plugin W1004 16:39:29.723010       1 gid_allocator.go:93] Requested GID range (50000:7000000) exceeds EFS Access Point limit (1000) per Filesystem. Driver will not allocate GIDs outside of this limit.
efs-plugin I1004 16:39:29.723067       1 controller.go:296] Using PV name for access point directory.
efs-plugin I1004 16:39:29.723640       1 controller.go:303] Using /pvc-ac121517-a2bf-4b94-a739-b4545c281f99 as the access point directory.
efs-plugin I1004 16:39:29.723666       1 cloud.go:266] ClientToken to find AP : d3580977b85988d6cd4a7d1a8b4745f839b633cd685f264e261e1f6953af77ae
efs-plugin I1004 16:39:29.764853       1 cloud.go:178] Existing AccessPoint found : &{AccessPointId:fsap-06d6d1615e941f8d3 FileSystemId:fs-057e96c51cbf0a652 AccessPointRootDir:/pvc-8699b6a6-f028-4926-b758-bdb7ab5f1e59 CapacityGiB:0 PosixUser:<nil>}
Stream closed EOF for kube-system/efs-csi-controller-655f7cf88c-vvjd7 (efs-plugin)

so maybe i do have it working as expected. Also, i see you comment on the hashing function for the pvc name, so that must explain why im seeing a random string for the clientToken instead of my PVC name.

@mskanth972
Copy link
Contributor Author

@andrewhharmon , yes you are correct. It was the hashed character.

@andrewhharmon
Copy link

andrewhharmon commented Oct 5, 2023

Follow up question for you. So if we set the reclaim policy to Retain, if we delete the PVC, the PV will stay as well as the access point. But we could potentially end up with lots of unbound PVs. If we delete one of the PVs (bound or unbound), it will also delete the access point. But we may still have other PVs using that access point still. So we'd have a PV pointing to an access point that doesn't exist? Is there any advise on the best way to accomplish this?

@bittracer
Copy link

bittracer commented Jun 10, 2024

Follow up question for you. So if we set the reclaim policy to Retain, if we delete the PVC, the PV will stay as well as the access point. But we could potentially end up with lots of unbound PVs. If we delete one of the PVs (bound or unbound), it will also delete the access point. But we may still have other PVs using that access point still. So we'd have a PV pointing to an access point that doesn't exist? Is there any advise on the best way to accomplish this?

@mskanth972 @Ashley-wenyizha I am running into the similar issue as @andrewhharmon has pointed here. Are there any suggestions on handling this situation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants