Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader with num_workers > 0 increases memory consumption over time #73

Open
juancq opened this issue Oct 19, 2023 · 2 comments
Open
Labels
bug Something isn't working resource constraints For when the system is too resource hungry

Comments

@juancq
Copy link
Contributor

juancq commented Oct 19, 2023

Using the dev branch, I have noticed that when using DataLoader with num_workers greater than 0, the memory consumption increases over time during a single epoch. This problem has been documented here:
https://docs.aws.amazon.com/codeguru/detector-library/python/pytorch-data-loader-with-multiple-workers/
pytorch/pytorch#13246 (comment)

@mmcdermott
Copy link
Owner

@juancq I just wanted to give you a heads up that what was formerly dev was merged into main and the new code in the dev branch has some changes to this from #74 that will be pulled out or re-worked to address the speed issues you caught, but dev itself now has that code in it. Apologies for any confusion.

@mmcdermott
Copy link
Owner

@juancq I believe there is a fix now for this and all related issues in #90. If you want to test it, though, right now it requires running the code in the branch in that PR and manually cloning then installing (via pip install -e .) this package: https://github.com/mmcdermott/nested_ragged_tensors as well which handles the manipulation of the ragged tensors that are used to speed things up and reduce memory costs here. This still caches files, so you may need to delete any previously cached files from the old version, but the cached files should be dramatically smaller and the actual runtime should have no memory leaks and be (at worst) competitive with the prior runtime and likely (on some tasks/settings) faster than the prior runtime on a iteration / batch level. No pressure to test for now; I'm going to push the other package to pypi so it can be installed normally and make a few other cosmetic improvements, but that's the state of things for your information.

@mmcdermott mmcdermott added bug Something isn't working resource constraints For when the system is too resource hungry labels Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resource constraints For when the system is too resource hungry
Projects
None yet
Development

No branches or pull requests

2 participants