Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed deletes should not prevent volume creation #126

Closed
acsulli opened this issue May 17, 2018 · 3 comments
Closed

Failed deletes should not prevent volume creation #126

acsulli opened this issue May 17, 2018 · 3 comments
Assignees

Comments

@acsulli
Copy link

acsulli commented May 17, 2018

When Trident attempts to delete a storage volume, and fails, that should not cause it to ignore other operations until successful. An example scenario:

  1. Create a PVC, resulting in a storage system volume created as expected.
  2. Create a replication relationship on that volume using some external method, e.g. CLI or the GUI for ONTAP/SolidFire.
  3. Delete the PVC, which will result in Trident failing to delete the storage volume until the replication relationship has been removed.

At this point, Trident will "hang" attempting to delete the volume until it's resolved. Having it do something similar to Kubernetes' "CrashLoopBackoff" and continue to perform other create/delete actions would be desirable.

@kangarlou
Copy link
Contributor

Trident doesn't hang if it fails to create or delete a volume. After a failed ZAPI, Trident moves on to process the next request. Upon such failures, Trident reattempts the operation after one minute. If the cause of the failure is an out-of-bound replication, then subsequent reattempts are bound to fail as Trident has no knowledge of the mirroring relationship. These failures shouldn't not prevent provisioning of new volumes unless the failure results in panics.

@guillebianco
Copy link

guillebianco commented May 22, 2018

If snapmirror is configured in a Trident-controlled volume and that volume is deleted, Trident initialization will fail thus failing to provision a new volume (since it can't delete the original one)

eg:

time="2018-05-22T13:55:14Z" level=debug msg="Kubernetes frontend got notified of a PVC." PVC=guille-trident-test-5 PVC_accessModes="[ReadWriteOnce]" PVC_annotations="map[volume.beta.kubernetes.io/storage-provisioner:netapp.io/trident]" PVC_eventType=update PVC_phase=Pending PVC_size=1Gi PVC_storageClass=standard-nas PVC_uid=c0b34d03-5dc7-11e8-8865-0050569e3732 PVC_volume=
time="2018-05-22T13:55:14Z" level=warning msg="Kubernetes frontend couldn't provision a volume: Trident initialization failed; unable to clean up deleted volume bi-as-bidatalab-dev-jenkins-home-5e88f: error destroying volume ingsafascl03_cs01_bi_as_bidatalab_dev_jenkins_home_5e88f: API status: failed, Reason: Volume \"ingsafascl03_cs01_bi_as_bidatalab_dev_jenkins_home_5e88f\" in Vserver \"ingsafascl03-cs01\" is the source endpoint of one or more SnapMirror relationships. Before you delete the volume, you must release the source information of the SnapMirror relationships using \"snapmirror release\". To display the destinations to be used in the \"snapmirror release\" commands, use the \"snapmirror list-destinations -source-vserver ingsafascl03-cs01 -source-volume ingsafascl03_cs01_bi_as_bidatalab_dev_jenkins_home_5e88f\" command., Code: 18436 (will retry upon resync)" volume=trident-guille-trident-test-5-c0b34

@kangarlou
Copy link
Contributor

Thanks, this makes sense. If Trident has bootstrapped successfully, the failure in deleting a volume shouldn't have any impact on Trident. However, once Trident is restarted, the failed deletion causes Trident not to bootstrap successfully. The source of the problem is that the VolumeTransaction object isn't deleted after a failed operation. We'll fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants