Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] explore maximum memory usage for full window operations #9986

Open
revans2 opened this issue Dec 7, 2023 · 0 comments
Open

[FEA] explore maximum memory usage for full window operations #9986

revans2 opened this issue Dec 7, 2023 · 0 comments
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin task Work required that improves the product but is not user facing

Comments

@revans2
Copy link
Collaborator

revans2 commented Dec 7, 2023

Is your feature request related to a problem? Please describe.
Recently as a part of testing #9973 I found that I needed at least 6 GiB of memory for a single thread to sort the data for the query.

spark.time(spark.range(1000000000).selectExpr("id", "SUM(id) OVER (ORDER BY id ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING) as l2f", "SUM(id) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as l_run").orderBy(desc("id")).show())

And even then the running window was able to complete the work properly, but the bounded window could not. For that I needed about 8 GiB of memory.

Ideally I would like to see us be able to support this query with just 4 GiB of memory, because that is what our docs say we should be able to do. 4x the target batch size. This was run with the async allocator, which avoids fragmentation issues so this would not be valid on arena.

range should produce input batches exactly 1 GiB in size. The shuffle is not going to drop the size at all because the data is being sent to a single task. After that the sort should also output batches very close to 1 GiB in size too. So the output of the running window operation should make batches that are 2 GiB in size because it added a new column to the 1 GiB batches. This means that the range based window would end up likely producing batches that are 3 GiB in size. So 8 GiB for a 2 GiB input batch is what we would expect to work, but I think we want a way to actually do a split and retry on some of these larger input batches, but if there are other ideas I am open to them. What I really want to understand is what is happening with sort that causes it to need so much more memory.

Note that this is not to fix the issues. Just to understand exactly what is happening and file follow on issue so we can fix them.

@revans2 revans2 added ? - Needs Triage Need team to review and classify task Work required that improves the product but is not user facing reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Dec 7, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin task Work required that improves the product but is not user facing
Projects
None yet
Development

No branches or pull requests

3 participants