Improving Entry API to get the keys back when they are unused #690

steveklabnik · 2015-01-21T20:24:35Z

Issue by matthieu-m
Saturday Jan 17, 2015 at 15:09 GMT

For earlier discussion, see rust-lang/rust#21298

This issue was labelled with: A-collections in the Rust repository

First of all, I would like to congratulate Gankro on the Entry API, it really simplified the interface of associative containers.

If I had one nit, however, it is that the entry methods available always consume the key.

Now, said key is taken by value so of course it is consumed, and the key is actually inserted in the container via the call to VacantEntry::insert, so it seems reasonable. However, in the case the key slot in the container is already occupied, and we get an OccupiedEntry as a result, there seems to be little reason not to return the key to the user.

For completeness, one could actually also improve VacantEntry so that is possible for it to hand over the key it had consumed, although why would one use the Entry API for look-up without any wish to insert is for now a mystery to me.

The main benefit of returning the un-consumed key is allowing reusing it. When the key is a simple integer, this has little appeal, however when the key owns heap-allocated memory (such as String) then the ability to re-use the buffer rather than continuously allocating/deallocating seems like a huge boon.

I prototyped the idea of an improved Occupied below, the modification is simple, and casual perusal of http://doc.rust-lang.org/src/std/collections/hash/map.rs.html#1150-1195 seems to show that simply modifying the lines 1175-1177 to:

return Occupied(OccupiedEntry{ elem: bucket, }, k); // Note the ", k"

would be the only change to HashMap implementation; likewise the entry change at lines 1366-1371 is simple enough:

pub enum Entry<'a, K: 'a, V: 'a> {
    /// An occupied Entry, and the unconsumed key
    Occupied(OccupiedEntry<'a, K, V>, K), // Note the ", K"
    /// A vacant Entry
    Vacant(VacantEntry<'a, K, V>),
}

I suspect, but have not verified, that other associative containers would likely be as easily modified. And of course, it would impact all consumers...

The example below show cases achieving buffer re-use through such an improved API, note how easy it is.

#![allow(unstable)]

use std::default::Default;

struct OccupiedEntry<K, V>;

struct VacantEntry<K, V> {
    entry: K,
}

enum Entry<K, V> {
    Occupied(OccupiedEntry<K, V>, K), // Simply add one more element to Occupied.
    Vacant(VacantEntry<K, V>),
}

impl<K, V> Entry<K, V> {
    fn new(key: K, really: bool) -> Entry<K, V> {
        if really {
            Entry::Occupied(OccupiedEntry, key)
        } else {
            Entry::Vacant(VacantEntry{entry: key})
        }
    }
}

fn main() {
    let mut acc: Vec<String> = vec!();

    let mut word: String = Default::default();
    for i in "Lorem iipsum and all that".split_str(" ") {
        word.clear();
        word.push_str(i);

        word = match Entry::<String, i32>::new(word, i.len() % 2 == 0) {
            Entry::Occupied(_, key) => { key }
            Entry::Vacant(v) => { acc.push(v.entry); Default::default() }
        };

        println!("{}\t{}", word, i);
    }

    println!("{:?}", acc);
}

The text was updated successfully, but these errors were encountered:

Gankra · 2015-01-21T20:30:23Z

I still agree with this design, but there's some concern that it is optimizing for the general case, and not the common case.

edwardw · 2015-01-26T13:20:28Z

Another common scenario would be to insert an entry if absent and return an immutable reference to the value. Currently OccupiedEntry::into_mut and VacantEntry::insert all return a mutable reference, and what OccupiedEntry::get returns can't live longer than the match arm.

Of course, it is easy enough to convert a mutable reference to an immutable one:

fn get_cache_entry(&self, key: &CacheKey) -> &CacheValue {
    match self.cache.entry(key) {
        Entry::Occupied(e) => &*e.into_mut(),
        Entry::Vacant(e) => {
            let cache_value = CacheValue { ... };
            &*e.insert(cache_value)
        }
    }
}

But I'd argue that this is too common a use case to ignore.

Gankra · 2015-01-30T05:21:30Z

So I've thought about this more. I can't think of any example where you'd "want the key back" that wouldn't be equivalent to "I want to clone on insert". I suppose clone-on-insert incurs one extra clone (the last one is potentially unnecessary), but it's unlikely that you'd be able to do avoid this in practice. Also clone-on-insert will likely be more memory effecient, as exactly the desired capacity for the elements in the Vec/String will be used, as opposed to just pushing in the buffer as-is.

That brings us back to the IntoOwned/IntoCow design that we tried to land awhile back but had to revert because the by-value case is too important. As such I'm reasonably confident that it will be satisfactory when we have the design where you'll be able to lookup with either a reference or a value, and if you pass in a reference it will be cloned on insertion.

@matthieu-m does that sound good to you?

CC @aturon

matthieu-m · 2015-01-30T17:35:09Z

The IntoOwned would indeed work for most cases I think.

It would perform exactly 1 more memory allocation IF the buffer is consumed in the last iteration while having the same number of allocations in all other cases. Furthermore, as you mention, it would allow tailoring the allocation of the keys to exactly what is needed and have a single (largish but short-lived) buffer.

So I agree with you that IntoOwned sounds like a preferable option if we can make it work.

matthieu-m · 2015-01-31T11:53:20Z

@gankro actually, I am wondering if IntoOwned is going to be that easy.

For efficiency, the slice should only become a String (with IntoOwned) if insertion is necessary. At the extreme, it means Vacant should contain a slice, though we might argue that if you use the Entry API then you wanted to insert if it is missing (otherwise you would just use a search) and therefore it would be okay for Vacant to contain the String already. In any case, though, if the Entry API returns a Occupied then no String should have been allocated.

Do we agree up until there?

Now, if we agree on this, how do we guarantee the coherence of the hashing operation? The simplest thing would be to transform the slice to a String with IntoOwned and hash the String, but then you might as well let the caller perform the transformation (as today).

Another alternative would be to actually do the reverse operation, transform the String into a slice before hashing (Deref?), however the benefit of IntoOwned is that any type that implements IntoOwned can be used efficiently, so we cannot so constrain the HashMap.

In the end, it seems that with the IntoOwned we will have to rely on the caller guaranteeing that both slice and the String it transforms into produce the same hash value (and can be compared with ==, but we can ask for a Eq bound here). We could double-check it by hashing the String once we have it built and panic in case of issue, but again that's slightly wasteful (maybe only in Debug) and of course it comes "too late".

On the other hand, the idea of handing the key back might not be as general (forces the user to handle the buffer on her own), but at least it does not have hash coherence issues.

Gankra · 2015-01-31T17:56:01Z

IntoOwned implies BorrowFrom, which we already pervasively assume implies Hash/Eq/Ord are equivalent for.

DanteMarshal · 2022-11-21T20:11:54Z

is there any news on this ?

Another use-case that I stumbled upon is implementing an add method on a wrapping type that returns a Result<&mut V, SomeError<K, V>>, where the Err variant includes some error that owns the Key and the Value that were moved during the call to the add method; basically "Rejecting" the input and giving the ownership back to the user if the entry is not vacant.

Ceffa93 · 2023-09-09T11:45:41Z

As such I'm reasonably confident that it will be satisfactory when we have the design where you'll be able to lookup with either a reference or a value, and if you pass in a reference it will be cloned on insertion.

Is this, or a different solution still on the roadmap?

kennytm · 2024-02-07T09:09:25Z

(This should be moved to ACP, if still valid?)

However, in the case the key slot in the container is already occupied, and we get an OccupiedEntry as a result, there seems to be little reason not to return the key to the user.

If you want to get back an owned key originally from the collection there is now OccupiedEntry::remove_entry(), replace_entry() and replace_key() depending on the use case.

There is no method to get back the owned key provided to the .entry() function though.

For HashMap, if you need to retain the key's ownership the current solution is to use .raw_entry_mut() (rust-lang/rust#56167) like,

match map.raw_entry_mut().from_key(&key) {
    RawEntryMut::Occupied(o) => {
        // we have owned access to the provided `key` here. do whatever you like.
        drop(key);
        ..
    },
    RawEntryMut::Vacant(v) => ..,
};

(The .raw_entry() API is very unstable and may get replaced by something else like .entry_ref() in the future. It does not exist on BTreeMap yet.)

For completeness, one could actually also improve VacantEntry so that is possible for it to hand over the key it had consumed

We now have VacantEntry::into_key().

steveklabnik mentioned this issue Jan 21, 2015

Improving Entry API to get the keys back when they are unused rust-lang/rust#21298

Closed

steveklabnik added the A-wishlist label Jan 21, 2015

seppo0010 mentioned this issue Jun 23, 2015

Add member operations to collections rust-lang/rust#26531

Closed

petrochenkov added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jan 28, 2018

petrochenkov removed the A-wishlist label Feb 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Entry API to get the keys back when they are unused #690

Improving Entry API to get the keys back when they are unused #690

steveklabnik commented Jan 21, 2015

Gankra commented Jan 21, 2015

edwardw commented Jan 26, 2015

Gankra commented Jan 30, 2015

matthieu-m commented Jan 30, 2015

matthieu-m commented Jan 31, 2015

Gankra commented Jan 31, 2015

DanteMarshal commented Nov 21, 2022

Ceffa93 commented Sep 9, 2023 •

edited

Loading

kennytm commented Feb 7, 2024

Improving Entry API to get the keys back when they are unused #690

Improving Entry API to get the keys back when they are unused #690

Comments

steveklabnik commented Jan 21, 2015

Gankra commented Jan 21, 2015

edwardw commented Jan 26, 2015

Gankra commented Jan 30, 2015

matthieu-m commented Jan 30, 2015

matthieu-m commented Jan 31, 2015

Gankra commented Jan 31, 2015

DanteMarshal commented Nov 21, 2022

Ceffa93 commented Sep 9, 2023 • edited Loading

kennytm commented Feb 7, 2024

Ceffa93 commented Sep 9, 2023 •

edited

Loading