Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTreeMap cursors #141

Closed
Amanieu opened this issue Nov 28, 2022 · 8 comments
Closed

BTreeMap cursors #141

Amanieu opened this issue Nov 28, 2022 · 8 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@Amanieu
Copy link
Member

Amanieu commented Nov 28, 2022

Proposal

Problem statement

One of the fundamental properties of BTreeMap is that it maintains elements in sorted order and enables efficient element lookup in O(log(N)) time. However the current API is overly fitted towards a key-value API like a HashMap and fails to expose the ability to make queries about "nearby" keys. For example, finding the first element whose key is greater than X.

There is limited support for this with the range API, but this is hard to use and doesn't allow mutating the tree (by inserting/deleting elements) while iterating. This is insufficient to address existing use cases.

Motivation, use-cases

One specific use case where such manipulations are useful is when using a BTreeMap to represent a set of non-overlapping ranges with associated values. This is often used to associate metadata with ranges of memory addresses.

As a proof of concept, I implemented a RangeTree type which conceptually holds a map of Range<K> -> V. It has 3 main operations:

  • get: returns the range containing the given key.
  • remove: removes any ranges that intersect the given range of keys. Partially overlapping ranges are truncated or split into 2 sub-ranges.
  • insert: removes any intersecting ranges (with the same behavior as remove for partially overlapping ranges) and then inserts a new range.

There are two implementations of these operations: one with just the standard library API and one with the new proposed cursor API.

Benchmark results show a 25% to 50% speedup by using cursors:

insert non-overlapping  time:   [82.516 ns 88.146 ns 93.282 ns]
                        change: [-33.810% -27.769% -20.292%] (p = 0.00 < 0.05)
                        Performance has improved.

insert overlapping end  time:   [76.902 ns 80.527 ns 83.670 ns]
                        change: [-52.589% -49.404% -46.189%] (p = 0.00 < 0.05)
                        Performance has improved.

insert overlapping start
                        time:   [73.223 ns 77.361 ns 81.151 ns]
                        change: [-32.392% -26.673% -20.424%] (p = 0.00 < 0.05)
                        Performance has improved.

insert overlapping all  time:   [187.08 ns 193.34 ns 198.79 ns]
                        change: [-29.838% -26.102% -22.680%] (p = 0.00 < 0.05)
                        Performance has improved.

insert within existing  time:   [75.748 ns 79.829 ns 83.380 ns]
                        change: [-32.192% -26.014% -19.129%] (p = 0.00 < 0.05)
                        Performance has improved.

remove non-overlapping  time:   [43.850 ns 46.363 ns 48.454 ns]
                        change: [-48.288% -43.011% -36.968%] (p = 0.00 < 0.05)
                        Performance has improved.

remove overlapping end  time:   [60.592 ns 64.339 ns 67.775 ns]
                        change: [-59.367% -56.141% -52.421%] (p = 0.00 < 0.05)
                        Performance has improved.

remove overlapping start
                        time:   [45.273 ns 48.281 ns 50.901 ns]
                        change: [-49.392% -43.853% -37.379%] (p = 0.00 < 0.05)
                        Performance has improved.

remove overlapping all  time:   [182.28 ns 189.05 ns 194.94 ns]
                        change: [-37.365% -34.433% -31.290%] (p = 0.00 < 0.05)
                        Performance has improved.

remove within existing  time:   [39.387 ns 41.828 ns 43.930 ns]
                        change: [-46.469% -41.060% -34.474%] (p = 0.00 < 0.05)
                        Performance has improved.

Solution sketches

This proposal adds Cursor and CursorMut types to BTreeMap based on similar types for LinkedList.

A Cursor is like an iterator, except that it can freely seek back-and-forth, and can safely mutate the tree during iteration. This is because the lifetime of its yielded references is tied to its own lifetime, instead of just the underlying tree. A cursor either points to an element in the tree, or to a "ghost" non-element that is logically located after the last element and before the first element.

This proposal adds the following APIs to BTreeMap:

impl<K, V> BTreeMap<K, V> {
    /// Returns a cursor pointing to the first element that is above the given bound.
    ///
    /// Passing `Unbounded` will return a cursor pointing to the first element in the tree.
    fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;

    /// Returns a mutable cursor pointing to the first element that is above the given bound.
    ///
    /// Passing `Unbounded` will return a cursor pointing to the first element in the tree.
    fn lower_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;

    /// Returns a cursor pointing to the last element that is below the given bound.
    ///
    /// Passing `Unbounded` will return a cursor pointing to the last element in the tree.
    fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Cursor<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;

    /// Returns a mutable cursor pointing to the last element that is below the given bound.
    ///
    /// Passing `Unbounded` will return a cursor pointing to the last element in the tree.
    fn upper_bound_mut<Q>(&mut self, bound: Bound<&Q>) -> CursorMut<'_, K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord;
}

/// An immutable cursor which only allows read-only operations.
struct Cursor<'a, K: 'a, V: 'a>;

impl<'a, K, V> Cursor<'a, K, V> {
    /// Moving the cursor to the next/previous element.
    fn move_next(&mut self);
    fn move_prev(&mut self);

    /// Access to the key and value of the current element.
    ///
    /// Returns `None` if the cursor points to the "ghost" non-element.
    fn key(&self) -> Option<&'a K>;
    fn value(&self) -> Option<&'a V>;
    fn key_value(&self) -> Option<(&'a K, &'a V)>;

    /// Read-only access to adjacent elements.
    fn peek_next(&self) -> Option<(&'a K, &'a V)>;
    fn peek_prev(&self) -> Option<(&'a K, &'a V)>;
}

/// An mutable cursor which allows mutating the tree.
struct CursorMut<'a, K: 'a, V: 'a>;

impl<'a, K, V> CursorMut<'a, K, V> {
    /// Moving the cursor to the next/previous element.
    fn move_next(&mut self);
    fn move_prev(&mut self);

    /// Access to the key and value of the current element.
    ///
    /// Returns `None` if the cursor points to the "ghost" non-element.
    fn key(&self) -> Option<&K>;
    fn value(&self) -> Option<&V>;
    fn value_mut(&mut self) -> Option<&mut V>;
    fn key_value(&self) -> Option<(&K, &V)>;
    fn key_value_mut(&self) -> Option<(&K, &mut V)>;

    /// Returns a mutable reference to the of the element that the cursor is
    /// currently pointing to.
    ///
    /// This can be used to modify the key, but you must ensure that the
    /// `BTreeMap` invariants are maintained. Specifically:
    ///
    /// * The key must remain unique within the tree.
    /// * The key must remain in sorted order with regards to other elements in
    ///   the tree.
    unsafe fn key_mut_unchecked(&mut self) -> Option<&mut K>;

    /// Read-only access to adjacent elements.
    fn peek_next(&self) -> Option<(&K, &V)>;
    fn peek_prev(&self) -> Option<(&K, &V)>;

    /// Returns a read-only cursor pointing to the current element.
    ///
    /// The lifetime of the returned `Cursor` is bound to that of the
    /// `CursorMut`, which means it cannot outlive the `CursorMut` and that the
    /// `CursorMut` is frozen for the lifetime of the `Cursor`.
    fn as_cursor(&self) -> Cursor<'_, K, V>;

    /// Inserts a new element into the `BTreeMap` after the current one.
    ///
    /// If the cursor is pointing at the "ghost" non-element then the new element is
    /// inserted at the front of the `BTreeMap`.
    ///
    /// # Panics
    ///
    /// This function panics if:
    /// - the given key compares less than or equal to the current element (if
    ///   any).
    /// - the given key compares greater than or equal to the next element (if
    ///   any).
    fn insert_after(&mut self, key: K, value: V);

    /// Inserts a new element into the `BTreeMap` before the current one.
    ///
    /// If the cursor is pointing at the "ghost" non-element then the new element is
    /// inserted at the end of the `BTreeMap`.
    ///
    /// # Panics
    ///
    /// This function panics if:
    /// - the given key compares greater than or equal to the current element
    ///   (if any).
    /// - the given key compares less than or equal to the previous element (if
    ///   any).
    fn insert_before(&mut self, key: K, value: V);

    /// Inserts a new element into the `BTreeMap` after the current one.
    ///
    /// If the cursor is pointing at the "ghost" non-element then the new element is
    /// inserted at the front of the `BTreeMap`.
    ///
    /// # Safety
    ///
    /// You must ensure that the `BTreeMap` invariants are maintained.
    /// Specifically:
    ///
    /// * The key of the newly inserted element must be unique in the tree.
    /// * All keys in the tree must remain in sorted order.
    unsafe fn insert_after_unchecked(&mut self, key: K, value: V);

    /// Inserts a new element into the `BTreeMap` before the current one.
    ///
    /// If the cursor is pointing at the "ghost" non-element then the new element is
    /// inserted at the end of the `BTreeMap`.
    ///
    /// # Safety
    ///
    /// You must ensure that the `BTreeMap` invariants are maintained.
    /// Specifically:
    ///
    /// * The key of the newly inserted element must be unique in the tree.
    /// * All keys in the tree must remain in sorted order.
    unsafe fn insert_before_unchecked(&mut self, key: K, value: V);

    /// Removes the current element from the `BTreeMap`.
    ///
    /// The element that was removed is returned, and the cursor is
    /// moved to point to the next element in the `BTreeMap`.
    ///
    /// If the cursor is currently pointing to the "ghost" non-element then no element
    /// is removed and `None` is returned. The cursor is not moved in this case.
    fn remove_current(&mut self) -> Option<(K, V)>;

    /// Removes the current element from the `BTreeMap`.
    ///
    /// The element that was removed is returned, and the cursor is
    /// moved to point to the previous element in the `BTreeMap`.
    ///
    /// If the cursor is currently pointing to the "ghost" non-element then no element
    /// is removed and `None` is returned. The cursor is not moved in this case.
    fn remove_current_and_move_back(&mut self) -> Option<(K, V)>;
}

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

@Amanieu Amanieu added T-libs-api api-change-proposal A proposal to add or alter unstable APIs in the standard libraries labels Nov 28, 2022
@scottmcm
Copy link
Member

This is still safe in the same way that "malicious" Ord implementations are safe

I disagree, because it allows violating invariants for a type where the Ord implementation is trusted.

For example, today if you give me a BTreeMap<usize, ()> I can trust that there are no duplicates and that if I iterate they'll be in order, because I trust BTreeMap, I trust usize, and I trust ().

But with the addition of a safe _unchecked API, I can no longer trust maps even when they only hold my own types that I trust.

@Amanieu
Copy link
Member Author

Amanieu commented Nov 29, 2022

Alright I've changed those functions to be unsafe.

@the8472
Copy link
Member

the8472 commented Nov 29, 2022

Why only the unsafe APIs and not equivalent safe ones that do a comparison against the neighbors?

@scottmcm
Copy link
Member

A safe insert that inserts in the "correct" place but with improved performance when that place is close to the cursor might be nice. (Like the map::inserts that take an iterator position in C++, https://en.cppreference.com/w/cpp/container/map/insert.)

@Amanieu
Copy link
Member Author

Amanieu commented Nov 30, 2022

I wouldn't use them in RangeTree myself, but I've added checked variants of the functions. I'm still not a big fan of making the unchecked versions unsafe since it can introduce confusion. For example it's perfectly safe for RangeTree to use the unchecked functions even if the key uses an untrusted Ord implementation.

@the8472
Copy link
Member

the8472 commented Nov 30, 2022

The API docs say

It is a logic error for a key to be modified in such a way that the key’s ordering relative to any other key, as determined by the Ord trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

The cursor API doesn't fall under any of those categories and I don't think "normally" provides enough wiggle room here.

@jeffparsons
Copy link

I maintain the rangemap crate, which is essentially the same as the motivating use case described above, plus automatic coalescing of adjacent ranges that map to the same value. E.g. if you insert (1..2, true) and (2..3, true), then the map now contains [(1..3, true)].

I have longed for this API since day one. I care about performance, of course, but I care even more about how much simpler my implementation could be if I could build on top of CursorMut.

So I guess this is just my 2c from experiencing the lack of BTreeMap cursors: they feel to me like a very natural and powerful extension.

@m-ou-se
Copy link
Member

m-ou-se commented Jan 30, 2023

This looks quite complete and ready for unstable experimentation. Time to open a tracking issue and merge 105641. :)

@m-ou-se m-ou-se closed this as completed Jan 30, 2023
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Feb 8, 2023
Implement cursors for BTreeMap

See the ACP for an overview of the API: rust-lang/libs-team#141

The implementation is split into 2 commits:
- The first changes the internal insertion functions to return a handle to the newly inserted element. The lifetimes involved are a bit hairy since we need a mutable handle to both the `BTreeMap` itself (which holds the root) and the nodes allocated in memory. I have tested that this passes the standard library testsuite under miri.
- The second commit implements the cursor API itself. This is more straightforward to follow but still involves some unsafe code to deal with simultaneous mutable borrows of the tree root and the node that is currently being iterated.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 8, 2023
Implement cursors for BTreeMap

See the ACP for an overview of the API: rust-lang/libs-team#141

The implementation is split into 2 commits:
- The first changes the internal insertion functions to return a handle to the newly inserted element. The lifetimes involved are a bit hairy since we need a mutable handle to both the `BTreeMap` itself (which holds the root) and the nodes allocated in memory. I have tested that this passes the standard library testsuite under miri.
- The second commit implements the cursor API itself. This is more straightforward to follow but still involves some unsafe code to deal with simultaneous mutable borrows of the tree root and the node that is currently being iterated.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 8, 2023
Implement cursors for BTreeMap

See the ACP for an overview of the API: rust-lang/libs-team#141

The implementation is split into 2 commits:
- The first changes the internal insertion functions to return a handle to the newly inserted element. The lifetimes involved are a bit hairy since we need a mutable handle to both the `BTreeMap` itself (which holds the root) and the nodes allocated in memory. I have tested that this passes the standard library testsuite under miri.
- The second commit implements the cursor API itself. This is more straightforward to follow but still involves some unsafe code to deal with simultaneous mutable borrows of the tree root and the node that is currently being iterated.
thomcc pushed a commit to tcdi/postgrestd that referenced this issue May 31, 2023
Implement cursors for BTreeMap

See the ACP for an overview of the API: rust-lang/libs-team#141

The implementation is split into 2 commits:
- The first changes the internal insertion functions to return a handle to the newly inserted element. The lifetimes involved are a bit hairy since we need a mutable handle to both the `BTreeMap` itself (which holds the root) and the nodes allocated in memory. I have tested that this passes the standard library testsuite under miri.
- The second commit implements the cursor API itself. This is more straightforward to follow but still involves some unsafe code to deal with simultaneous mutable borrows of the tree root and the node that is currently being iterated.
@dtolnay dtolnay added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

6 participants