Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Changing COO Index_Type in UMAP to prevent overflow when running with large datasets #6010

Open
jinsolp opened this issue Aug 6, 2024 · 0 comments
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@jinsolp
Copy link
Contributor

jinsolp commented Aug 6, 2024

Description

UMAP cannot run large datasets right now because of an overflow issue.
raft::sparse::COO defaults to using int for its Index_Type and this becomes a problem.

When this issue is solved, we need to update UMAPAlgo::FuzzySimplSet::ML::run() to take COO with an Index_Type other than int.

Details

Specifically, coo_symmetrize (raft function called from UMAPAlgo::FuzzySimplSet::ML::run()) allocates nnz * 2 space on device. For a large dataset (e.g. 88M samples with knn graph degree 16) this value is larger than max int (88M * 16 * 2 > INT_MAX).

@jinsolp jinsolp added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant