Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving top-k results for k >= 257 produces corrupted outputs #317

Open
NTT123 opened this issue Sep 9, 2024 · 4 comments
Open

Retrieving top-k results for k >= 257 produces corrupted outputs #317

NTT123 opened this issue Sep 9, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@NTT123
Copy link

NTT123 commented Sep 9, 2024

Describe the bug

Retrieving top-k results with ivf_flat for k >= 257 produces corrupted outputs. I also encountered the same issue when using ivf_pq.

Steps/Code to reproduce the bug

I followed the tutorial at this link.

When setting k = 257, the outputs are corrupted. I then plotted the distance values:

import matplotlib.pyplot as plt
plt.plot(distances[0, :])
plt.show()

Corrupted Output

Expected behavior

For reference, this is the result when k = 256:

Expected Output

Environment details

Installed packages:

pip install cuvs-cu12==24.8.0 cupy-cuda12x==12.2.0 --extra-index-url=https://pypi.nvidia.com

Additional context

Interestingly, performing a search with k > 256 followed by a refinement step with k <= 256 works correctly. However, if I attempt to refine with k > 256, the issue reoccurs.

@NTT123 NTT123 added the bug Something isn't working label Sep 9, 2024
@NTT123
Copy link
Author

NTT123 commented Sep 9, 2024

It looks like the outputs when k > 256 are indeed top-k results, but they are simply unsorted.

@achirkin
Copy link
Contributor

achirkin commented Sep 9, 2024

Hi there,
as far as I remember, cuVS doesn't guarantee the ordering of the results. If the ordering is the only issue, then this sounds like an expected behavior. Under the hood, both ivf-pq and ivf-flat use one of the several top-k-selection methods while scanning through the index; some of them produce the ordered output by design, others - don't (which saves a little bit of cycles). Typically, the unordered one is activated for larger k (that is a radix-based sorting).

@NTT123
Copy link
Author

NTT123 commented Sep 9, 2024

I guess technically it is not a bug, but from the user's point of view, the expected behavior is that it should be sorted (and indeed it is sorted for k <= 256). So, I think being a bit more transparent (e.g., in the documentation, tutorials, etc.) could be a great help.

@NTT123 NTT123 closed this as completed Sep 17, 2024
@cjnolet cjnolet reopened this Sep 17, 2024
@cjnolet
Copy link
Member

cjnolet commented Sep 17, 2024

Hi @NTT123 i was out of the office for a few days last week and lost track of this discussion.

What you are asking for is very reasonable, so at the very least I think we could add something to the docs for the algos as to when they should be assumed to be sorted. I also think we could prioritize a feature (likely for our December release) to allow finer user control over the sorting of the results.

Let’s keep this one open and I’m going to mark it for the appropriate release. As always, if you are interested in diving into the code, we can guide you along the change as well. Contributions always welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

3 participants