Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent name and Exponential backoff #66

Merged
merged 1 commit into from
Apr 6, 2018
Merged

Persistent name and Exponential backoff #66

merged 1 commit into from
Apr 6, 2018

Conversation

sbezverk
Copy link
Contributor

@sbezverk sbezverk commented Mar 28, 2018

This PR addresses two issues:

  1. For a specific PVC, external provisioner always generates the same PV name
  2. Implements Exponential Backoff for long operations like CreateVolume

Now there are 2 timers, one short timeout which is enforced by the context, second is exponential backoff timer.
This functionality will require support from the driver. It needs to detect if the request to CreateVolume comes for already initiated volume and then check for completion periodically, once it is completed it should return CreateVolumeResponse struct.

Fixes #63

@sbezverk
Copy link
Contributor Author

@lpabon could you please give it a spin in e2e and review the code.


rep, err := p.csiClient.CreateVolume(ctx, &req)
opts := wait.Backoff{Factor: backoffFactor, Steps: backoffSteps}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, since I am not familiar with the provisioner code, does this path happen on a separate go routine? Or does it block the provisioner from processing other PVCs?

If it blocks other provisioners, we should change the model such that we have a separate goroutine per volume (maybe using https://github.com/kubernetes/kubernetes/blob/master/pkg/util/goroutinemap/goroutinemap.go).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the provisioner controller code and it seem each Provision action would be run in a separate go routine. So it seems to be safe to block one specific one while it waits for a driver response.

@@ -247,12 +251,17 @@ func (p *csiProvisioner) Provision(options controller.VolumeOptions) (*v1.Persis
if err != nil {
return nil, err
}
// create random share name
share := fmt.Sprintf("%s-%s", p.volumeNamePrefix, strings.Replace(string(uuid.NewUUID()), "-", "", -1)[0:p.volumeNameUUIDLength])
// create persistent name based on a volumeNamePrefix and volumeNameUUIDLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put volume name generation code in to a new method for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean make this code into a separate func?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, separate function please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if len(pvcUID) == 0 {
return nil, fmt.Errorf("corrupted PVC object, it is missing UID")
}
share := fmt.Sprintf("%s-%s", p.volumeNamePrefix, strings.Replace(string(pvcUID), "-", "", -1)[0:p.volumeNameUUIDLength])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pvcUID guaranteed to be unique across namespaces? If so, that is a pretty cool trick, and should be suggested to kubernetes/community#1790 as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is not 100% guaranteed since there is a super small opportunity of the collision, but I think since the chances are so small, so imho this way should be acceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes uses "pvc-" + string(claim.UID) as unique PV name and nobody complained so far (despite ugly and long name).

https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1525

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should align with Kubernetes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@saad-ali
Copy link
Member

saad-ali commented Apr 6, 2018

/lgtm
/approve

err = wait.ExponentialBackoff(opts, func() (bool, error) {
ctx, cancel := context.WithTimeout(context.Background(), p.timeout)
defer cancel()
rep, err = p.csiClient.CreateVolume(ctx, &req)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, but still depends on the csiCLient to be setup. That is the #64 where this client should always be setup and never stop retrying. It looks like this change does not handler #64. If it does not handle that issue and the client is never setup correctly, then .CreateVolume will never work. and will eventually time out.

It looks like Provision() and maybe other calls needs to create the connection to the client at every time it calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, up to @sbezverk if he wants to address it in this PR or in a seperate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpabon I am sorry but I do disagree with csi client constantly trying to connect. Here is why: We do not carry state of relation between provisioner and the driver. So every time when there is a need for provisioner to communicate, then it will build connection to the provisioner, check it is actual state of mind and run the operation. I have done lots of negative testing with this approach and it worked really well. If you can reproduce a failure scenario which is not coverer here, please let me know I would be really interested to dissect it ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I no problem. Let's go with this change and we can adjust the Connect() call if we need to.

@lpabon lpabon merged commit 3bbdee9 into kubernetes-csi:master Apr 6, 2018
jsafrane pushed a commit to jsafrane/external-provisioner that referenced this pull request Aug 14, 2023
…RS-and-OWNERS_ALIASES

OCPBUGS-14811: Chore: Update OWNERS and OWNERS_ALIASES
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants