Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Find and un-nest withResource where appropriate #6758

Closed
abellina opened this issue Oct 11, 2022 · 1 comment
Closed

[FEA] Find and un-nest withResource where appropriate #6758

abellina opened this issue Oct 11, 2022 · 1 comment
Assignees
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@abellina
Copy link
Collaborator

abellina commented Oct 11, 2022

We have a nice helper function withResource that has helped us emulate "try-with-resources" semantics from Java in Scala.

This can be abused, however. Here's an example:

withResource(iter.next()) { batch => 
  withResource(GpuColumnVector.from(batch)) { table => 
    withResource(table.callCudfFunction()) { table2 => 
      withResource(table2.callAnotherCudfFunction()) { table3 => 
      }
    }
  }
}

In this code we first transform batch into table, this doesn't incur new GPU memory, but table.callCudfFunction() returns a new table, table2, and so when we call the second cuDF function on table2 we now have in GPU memory: table, table2 and now table3.

Ideally the code above can be broken down into multiple chunks. Note that this isn't possible or clean in all cases, so this isn't an easy task.

val table = withResource(iter.next()) { batch => 
  GpuColumnVector.from(batch)
} 

val table2 = withResource(table) { _ =>
  table.callCudfFunction()
}

// we now only have `table2` in memory, instead of `table` and `table2` prior to making `table3`.

val table3 = withResource(table2) { _ =>
  table2.callAnotherCudfFunction()
}

First work on this issue should probably be to find big parts of the code where we want to focus, perhaps with some smarts to detect the issue. I could see several PRs being generated, per exec that suffers from the above, as we go through the code.

@abellina abellina added feature request New feature or request ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Oct 11, 2022
@sameerz sameerz removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Oct 11, 2022
gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Oct 18, 2022
Fixes NVIDIA#6758 after inspecting the result of a multiline regex search
```
(.*withResource.*[\n]){4,}
```
under modules src/main

Signed-off-by: Gera Shegalov <gera@apache.org>
gerashegalov added a commit that referenced this issue Oct 25, 2022
Contributes to #6758 after inspecting the result of a multiline regex search
```
(.*withResource.*[\n]){4,}
```
under modules src/main
 
Signed-off-by: Gera Shegalov <gera@apache.org>
@abellina
Copy link
Collaborator Author

abellina commented Dec 9, 2022

The original intent of this issue was to figure out how to instrument withResource to figure out memory growth, but we made inroads into memory waste by looking at the nested level, and by brute force running benchmarks with larger scales and less resources. That said, I think we should move towards measuring memory explosion using the Rmm apis for scoped maximum usage added in 22.12 via #6745, and enforce using #7257

@abellina abellina closed this as completed Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

3 participants