Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete volume requests are sent twice with new sig-storage-lib-external-provisioner v10.0.1 because of the finalizer "external-provisioner.volume.kubernetes.io/finalizer" #176

Open
zjuliuf opened this issue Aug 14, 2024 · 9 comments

Comments

@zjuliuf
Copy link

zjuliuf commented Aug 14, 2024

What happened:

When we try to delete volume using new new sig-storage-lib-external-provisioner v10.0.1, we still notice 2 or more volume deletion requests in the logs. As a result of which PV deleting costs more time.

What you expected to happen:

Only 1 volume deletion request during the pv deletion.

Simple analysis:

When volume related to the PV was deleted successfully, sig-storage-lib-external-provisioner begins to Delete PV,then PV object was updated with a deletionTimestamp, so that the PV was added to volumeQueue, then exec deleteVolumeOperation again because the finalizer "external-provisioner.volume.kubernetes.io/finalizer" exists. finally, 2 or more volume deletion requests was sent during the pv deletion.

Below is the code analysis:
(first time deletion) function 'shouldDelete' returns true ---> ctrl.provisioner.Delete deletes volume related to the PV ---> ctrl.client.CoreV1().PersistentVolumes().Delete deletes PV ---> PV was updated with adding deletionTimestamp ---> try to remove finalizer "external-provisioner.volume.kubernetes.io/finalizer"

(second time deletion) During the first time deletion, when PV was updated with adding deletionTimestamp(before removing finalizer), the PV was added to the volumeQueue again. Then the second time deletion began. Becuase the finalizer "external-provisioner.volume.kubernetes.io/finalizer" still existed and the PV status was Released, so the function 'shouldDelete' still returned 'true' and executed ctrl.provisioner.Delete again.

volumeHandler := cache.ResourceEventHandlerFuncs{
	AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
	UpdateFunc: func(oldObj, newObj interface{}) { controller.enqueueVolume(newObj) },
	DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}

How to reproduce it:

Call NewProvisionController with an option AddFinalizer(true)
Create a storageclass and PVC
Delete PVC
Notice the status of PV and logs of provisioner

Environment:

Driver version:
Kubernetes version (use kubectl version): v1.28
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others: sig-storage-lib-external-provisioner v10.0.1

@zjuliuf zjuliuf changed the title Delete volume requests are sent twice with new sig-storage-lib-external-provisioner v10.0.1 Delete volume requests are sent twice with new sig-storage-lib-external-provisioner v10.0.1 because of the finalizer "external-provisioner.volume.kubernetes.io/finalizer" Aug 14, 2024
@Jainbrt
Copy link

Jainbrt commented Aug 14, 2024

@zjuliuf you would need latest k8s release to verify the fix, such as 1.28.12.

@zjuliuf
Copy link
Author

zjuliuf commented Aug 14, 2024

@Jainbrt Thanks a lot, I will try. By the way, which PR of k8s fixed this issue?

@Jainbrt
Copy link

Jainbrt commented Aug 14, 2024

@zjuliuf
Copy link
Author

zjuliuf commented Aug 14, 2024

@Jainbrt I watched the status of the PV, its status was changed to 'Released' from 'Bound' without 'Failed'.

@Jainbrt
Copy link

Jainbrt commented Aug 14, 2024

is this the issue you are talking about kubernetes-csi/external-provisioner#1184 ?

@zjuliuf
Copy link
Author

zjuliuf commented Aug 15, 2024

@Jainbrt Looks like the same problem.
I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV(this finalizer removed successfully finally) so that another volume deletion request was sent to the volume provisioner. Volume and PV can be deleted succefully finallly, only the deletion process will be executed more than 2 times(This doesn't happen every time and I guess because events of PV sometimes merge).

@zjuliuf
Copy link
Author

zjuliuf commented Aug 15, 2024

In order to reduce redundant deletion processes when 'addFinalizer' is true, can we modify the code like below? In the codes below, If only the DeletionTimestamp and Finalizers of pv change, the pv will not be added to the queue.
@jsafrane @xing-yang @Jainbrt

volumeHandler := cache.ResourceEventHandlerFuncs{
	AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
	UpdateFunc: func(oldObj, newObj interface{}) {
		if controller.addFinalizer {
			oldPv, ok := oldObj.(*v1.PersistentVolume)
			if !ok {
				return
			}
			newPv, ok := newObj.(*v1.PersistentVolume)
			if !ok {
				return
			}
			newPv.ResourceVersion = oldPv.ResourceVersion
			newPv.ManagedFields = oldPv.ManagedFields
			newPv.DeletionTimestamp = oldPv.DeletionTimestamp
			newPv.DeletionGracePeriodSeconds = oldPv.DeletionGracePeriodSeconds
			if len(newPv.Finalizers) <= len(oldPv.Finalizers) {
				newPv.Finalizers = oldPv.Finalizers
			}
			if equality.Semantic.DeepEqual(oldPv, newPv) {
				klog.V(2).Infof("debug, no changes except deletionTimestamp and finalizers")
				return
			}
		}
		controller.enqueueVolume(newObj)
	},
	DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}

@Chaunceyctx
Copy link

same problem occurs in my production environment

@jsafrane
Copy link
Contributor

jsafrane commented Aug 16, 2024

I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV

What events did the provisioner receive and who / what caused them? You can either get a diff in UpdateFunc in the provisioner or get it from the API server audit logs. It looks like something is racy here.

I would prefer to see what causes these events before we start ignoring them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants