[FEA] explore maximum memory usage for full window operations #9986

revans2 · 2023-12-07T14:31:50Z

Is your feature request related to a problem? Please describe.
Recently as a part of testing #9973 I found that I needed at least 6 GiB of memory for a single thread to sort the data for the query.

spark.time(spark.range(1000000000).selectExpr("id", "SUM(id) OVER (ORDER BY id ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING) as l2f", "SUM(id) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as l_run").orderBy(desc("id")).show())

And even then the running window was able to complete the work properly, but the bounded window could not. For that I needed about 8 GiB of memory.

Ideally I would like to see us be able to support this query with just 4 GiB of memory, because that is what our docs say we should be able to do. 4x the target batch size. This was run with the async allocator, which avoids fragmentation issues so this would not be valid on arena.

range should produce input batches exactly 1 GiB in size. The shuffle is not going to drop the size at all because the data is being sent to a single task. After that the sort should also output batches very close to 1 GiB in size too. So the output of the running window operation should make batches that are 2 GiB in size because it added a new column to the 1 GiB batches. This means that the range based window would end up likely producing batches that are 3 GiB in size. So 8 GiB for a 2 GiB input batch is what we would expect to work, but I think we want a way to actually do a split and retry on some of these larger input batches, but if there are other ideas I am open to them. What I really want to understand is what is happening with sort that causes it to need so much more memory.

Note that this is not to fix the issues. Just to understand exactly what is happening and file follow on issue so we can fix them.

The text was updated successfully, but these errors were encountered:

revans2 added ? - Needs Triage Need team to review and classify task Work required that improves the product but is not user facing reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Dec 7, 2023

mattahrens assigned mythrocks Dec 13, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 13, 2023

mattahrens unassigned mythrocks Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] explore maximum memory usage for full window operations #9986

[FEA] explore maximum memory usage for full window operations #9986

revans2 commented Dec 7, 2023

[FEA] explore maximum memory usage for full window operations #9986

[FEA] explore maximum memory usage for full window operations #9986

Comments

revans2 commented Dec 7, 2023