Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly clear MatrixStore's cache when building train/test tasks #594

Closed
thcrock opened this issue Feb 7, 2019 · 0 comments
Closed
Assignees

Comments

@thcrock
Copy link
Contributor

thcrock commented Feb 7, 2019

MatrixStore caches the matrix until it's serialized (e.g. to send over the wire for multiprocessing), at which point it clears the matrix out. This comes into play because when we generate the model training and testing tasks, we check the matrix's labels to validate if there is more than one unique value. Before #560 was resolved, right after we did this check we initiated a multiprocess pool, so the MatrixStore was serialized and cache cleared. Now, we actually build the entire Experiment's train/test tasks in one go, which means that all of the train and test matrices stay in memory while we're building the tasks. This can easily result in out of memory errors.

There could be several ways to fix this, but the simplest is probably explicitly clearing the cache during the task-building phase.

@thcrock thcrock self-assigned this Feb 7, 2019
nanounanue added a commit to dssg/dirtyduck that referenced this issue Feb 7, 2019
nanounanue added a commit that referenced this issue Feb 18, 2019
Fix MatrixStore memory leak [Resolves #594]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant