Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Median of Medians fallback to introselect #107522

Merged
merged 2 commits into from
May 26, 2023
Merged

Conversation

Sp00ph
Copy link
Member

@Sp00ph Sp00ph commented Jan 31, 2023

Fixes #102451.

This PR is a follow up to #106997. It adds a Fast Deterministic Selection implementation as a fallback to the introselect algorithm used by select_nth_unstable. This allows it to guarantee O(n) worst case running time, while maintaining good performance in all cases.

This would fix #102451, which was opened because the select_nth_unstable docs falsely claimed that it had O(n) worst case performance, even though it was actually quadratic in the worst case. #106997 improved the worst case complexity to O(n log n) by using heapsort as a fallback, and this PR further improves it to O(n) (this would also make #106933 unnecessary).
It also improves the actual runtime if the fallback gets called: Using a pathological input of size 1 << 19 (see the playground link in #102451), calculating the median is roughly 3x faster using fast deterministic selection as a fallback than it is using heapsort.

The downside to this is less code reuse between the sorting and selection algorithms, but I don't think it's that bad. The additional algorithms are ~250 LOC with no unsafe blocks (I tried using unsafe to avoid bounds checks but it didn't noticeably improve the performance).
I also let it fuzz for a while against the current select_nth_unstable implementation to ensure correctness, and it seems to still fulfill all the necessary postconditions.

cc @scottmcm who reviewed #106997

@rustbot
Copy link
Collaborator

rustbot commented Jan 31, 2023

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 31, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jan 31, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@Sp00ph
Copy link
Member Author

Sp00ph commented Jan 31, 2023

Forgot to mention, on random input the fast deterministic select is a bit slower than the pdq based quickselect, so I didn't make select_nth_unstable always call the new algorithm and instead just used it it the introselect fallback.

@scottmcm

This comment was marked as outdated.

@rustbot rustbot assigned Mark-Simulacrum and unassigned cuviper Jan 31, 2023
@scottmcm
Copy link
Member

Ahh, my apologies, I read cc as r? 🤦

Putting it back,
r? @cuviper

@rustbot rustbot assigned cuviper and unassigned Mark-Simulacrum Jan 31, 2023
@Sp00ph
Copy link
Member Author

Sp00ph commented Feb 1, 2023

I tried also implementing the ExpandPartition algorithm from the paper that takes into account which region of the slice we already know to be partitioned, but it doesn't really lead to noticeable speedups. I think it's better to just use the already existing partition (like I did in this PR), as that already seems pretty optimized.

@bors
Copy link
Contributor

bors commented Feb 13, 2023

☔ The latest upstream changes (presumably #107191) made this pull request unmergeable. Please resolve the merge conflicts.

@Sp00ph Sp00ph force-pushed the introselect branch 2 times, most recently from 089165c to 9dfe219 Compare February 13, 2023 20:41
@Sp00ph
Copy link
Member Author

Sp00ph commented Feb 20, 2023

Hmm, now that #106933 was merged, the API guarantees have been loosened to O(n log n) worst case runtime. I'm guessing we'd need an ACP to get it back down to O(n) worst case (and it's unclear to me if we'd even want to guarantee O(n)). Should I add note in the /// ## Current implementation section that states that the current algorithm (from this PR) always runs in linear time?

@orlp
Copy link
Contributor

orlp commented Mar 5, 2023

So first and foremost, the PR in its current form does not make the behavior linear, because the limit is set to log(n), so in the worst case we're still O(n log n).

Secondly, I think having the full Fast Deterministic Selection algorithm as a fallback algorithm does not make much sense. Either it should be run from start to finish, or we should reduce it to just the median_of_ninthers approach for its linearity. The fallback algorithm only really gets triggered by an incredibly unlucky input pattern or malicious input, so it does not make sense to have a lot of code baked into the executable to run our fallback approach at maximum speed.


/// helper function used to find the index of the min/max element
/// using e.g. `slice.iter().enumerate().min_by(from_is_less(&mut is_less)).unwrap()`
fn from_is_less<T>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the code could be made more readable by just having min_index and max_index helper functions that return an Option<usize>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the code could be made more readable by just having min_index and max_index helper functions that return an Option<usize>.

I don't think this has been addressed yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I kinda forgot about it, should be fixed now though.

@Sp00ph
Copy link
Member Author

Sp00ph commented Mar 5, 2023

So first and foremost, the PR in its current form does not make the behavior linear, because the limit is set to log(n), so in the worst case we're still O(n log n).

I had naively thought that the same limiting strategy used for introsort would also work for introselect, but yeah makes sense that it doesn't. Does this mean that to guarantee linear runtime we need a constant limit on the iterations?

Secondly, I think having the full Fast Deterministic Selection algorithm as a fallback algorithm does not make much sense. Either it should be run from start to finish, or we should reduce it to just the median_of_ninthers approach for its linearity. The fallback algorithm only really gets triggered by an incredibly unlucky input pattern or malicious input, so it does not make sense to have a lot of code baked into the executable to run our fallback approach at maximum speed.

In my tests, just using fast deterministic selection unconditionally was a bit slower than the current median-of-3 quickselect on random inputs, so I didn't want to do that. I also tried to just use median of medians first, but that was slower than using heapsort as a fallback. I haven't tried just using median of ninthers yet, I'll have to see if that still gets similar speedups to the full fast deterministic select.

@Sp00ph
Copy link
Member Author

Sp00ph commented Mar 5, 2023

I suppose it could do something like let limit = cmp::min(ilog2(len), SOME_UPPER_BOUND) which would still mean O(1) iterations. Would probably have to figure out a good value for that cutoff point though.

@orlp
Copy link
Contributor

orlp commented Mar 5, 2023

I suppose it could do something like let limit = cmp::min(ilog2(len), SOME_UPPER_BOUND) which would still mean O(1) iterations. Would probably have to figure out a good value for that cutoff point though.

I think you should do what I did in pdqsort originally: track whether the partition was 'good' or 'bad' by some threshold (I used n / 8 = 12.5% quantile for the pivot position as the cutoff), and then and only then increment the counter. This way you can put the counter relatively low without affecting performance for normal code while quickly switching to the fallback for malicious/unlucky inputs.

@workingjubilee
Copy link
Member

workingjubilee commented Mar 6, 2023

In my tests, just using fast deterministic selection unconditionally was a bit slower than the current median-of-3 quickselect on random inputs, so I didn't want to do that.

Performance on perfectly random inputs is slightly overrated. What about nonrandom inputs, like tokenized data from Wikipedia or such?

@Sp00ph
Copy link
Member Author

Sp00ph commented Mar 6, 2023

I currently don't have access to a powerful machine, but I can try to set up a somewhat comprehensive benchmark suite once I do to compare heapsort fallback vs median-of-ninthers fallback vs just fast deterministic select.

@cuviper
Copy link
Member

cuviper commented Mar 23, 2023

Sorry, I haven't had time to dig into this - let me roll a new reviewer...

r? libs

@rustbot rustbot assigned Mark-Simulacrum and unassigned cuviper Mar 23, 2023
@Sp00ph
Copy link
Member Author

Sp00ph commented Mar 23, 2023

I still need to change some stuff about the PR so

@rustbot author

@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 23, 2023
@rustbot rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 23, 2023
@Dylan-DPC
Copy link
Member

@Sp00ph any updates on this?

@Sp00ph
Copy link
Member Author

Sp00ph commented May 21, 2023

Not yet, I'll see if I can get around to it some time soon.

@Sp00ph
Copy link
Member Author

Sp00ph commented May 24, 2023

I wrote a little benchmark suite testing the current selection implementation, quickselect with median of medians as fallback and pure fast deterministic select on various slice lengths and input shapes and quickselect with median of medians fallback was consistently the fastest, so I changed it to that now. If anyone is interested I can upload the benchmark itself too. The algorithm should now also actually be O(n) worst case since I made the quickselect iteration limit constant now.

@Sp00ph Sp00ph changed the title Add Fast Deterministic Selection fallback to introselect Add Median of Medians fallback to introselect May 24, 2023
@Sp00ph
Copy link
Member Author

Sp00ph commented May 24, 2023

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 24, 2023
@Amanieu Amanieu assigned Amanieu and unassigned Mark-Simulacrum May 24, 2023
@Amanieu
Copy link
Member

Amanieu commented May 25, 2023

@bors r+

@bors
Copy link
Contributor

bors commented May 25, 2023

📌 Commit fd5fa01 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 25, 2023
@Sp00ph
Copy link
Member Author

Sp00ph commented May 25, 2023

I forgot to ask: As of #106933, select_nth_unstable is documented to be O(n log n) in the worst case (it was falsely documented as O(n) before). I'm assuming that reverting the time complexity to O(n) would count as an API change and require an ACP? Should I maybe still say in the /// # Current implementation section of select_nth_unstable that it currently is in fact always O(n)?

@Amanieu
Copy link
Member

Amanieu commented May 25, 2023

This can be submitted as a separate PR which we will run an FCP on, since this is a new stable guarantee. No ACP is required.

bors added a commit to rust-lang-ci/rust that referenced this pull request May 25, 2023
…mpiler-errors

Rollup of 7 pull requests

Successful merges:

 - rust-lang#107522 (Add Median of Medians fallback to introselect)
 - rust-lang#111152 (update `pulldown-cmark` to `0.9.3`)
 - rust-lang#111757 (Consider lint check attributes on match arms)
 - rust-lang#111831 (Always capture slice when pattern requires checking the length)
 - rust-lang#111929 (Don't print newlines in APITs)
 - rust-lang#111945 (Migrate GUI colors test to original CSS color format)
 - rust-lang#111950 (Remove ExpnKind::Inlined.)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit fb45513 into rust-lang:master May 26, 2023
@rustbot rustbot added this to the 1.71.0 milestone May 26, 2023
@Sp00ph Sp00ph deleted the introselect branch May 26, 2023 01:13
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jun 14, 2023
Update runtime guarantee for `select_nth_unstable`

rust-lang#106933 changed the runtime guarantee for `select_nth_unstable` from O(n) to O(n log n), since the old guarantee wasn't actually met by the implementation at the time. Now with rust-lang#107522, `select_nth_unstable` should be truly linear in runtime, so we can revert its runtime guarantee to O(n). Since rust-lang#106933 was considered a bug fix, this will probably need an FCP because it counts as a new API guarantee.

r? `@Amanieu`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

select_nth_unstable has quadratic worst-case time complexity; docs claim it should be linear
10 participants