Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Entry API to get the keys back when they are unused #690

Open
steveklabnik opened this issue Jan 21, 2015 · 9 comments
Open

Improving Entry API to get the keys back when they are unused #690

steveklabnik opened this issue Jan 21, 2015 · 9 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@steveklabnik
Copy link
Member

Issue by matthieu-m
Saturday Jan 17, 2015 at 15:09 GMT

For earlier discussion, see rust-lang/rust#21298

This issue was labelled with: A-collections in the Rust repository


First of all, I would like to congratulate Gankro on the Entry API, it really simplified the interface of associative containers.

If I had one nit, however, it is that the entry methods available always consume the key.

Now, said key is taken by value so of course it is consumed, and the key is actually inserted in the container via the call to VacantEntry::insert, so it seems reasonable. However, in the case the key slot in the container is already occupied, and we get an OccupiedEntry as a result, there seems to be little reason not to return the key to the user.

For completeness, one could actually also improve VacantEntry so that is possible for it to hand over the key it had consumed, although why would one use the Entry API for look-up without any wish to insert is for now a mystery to me.

The main benefit of returning the un-consumed key is allowing reusing it. When the key is a simple integer, this has little appeal, however when the key owns heap-allocated memory (such as String) then the ability to re-use the buffer rather than continuously allocating/deallocating seems like a huge boon.


I prototyped the idea of an improved Occupied below, the modification is simple, and casual perusal of http://doc.rust-lang.org/src/std/collections/hash/map.rs.html#1150-1195 seems to show that simply modifying the lines 1175-1177 to:

return Occupied(OccupiedEntry{ elem: bucket, }, k); // Note the ", k"

would be the only change to HashMap implementation; likewise the entry change at lines 1366-1371 is simple enough:

pub enum Entry<'a, K: 'a, V: 'a> {
    /// An occupied Entry, and the unconsumed key
    Occupied(OccupiedEntry<'a, K, V>, K), // Note the ", K"
    /// A vacant Entry
    Vacant(VacantEntry<'a, K, V>),
}

I suspect, but have not verified, that other associative containers would likely be as easily modified. And of course, it would impact all consumers...

The example below show cases achieving buffer re-use through such an improved API, note how easy it is.

#![allow(unstable)]

use std::default::Default;

struct OccupiedEntry<K, V>;

struct VacantEntry<K, V> {
    entry: K,
}

enum Entry<K, V> {
    Occupied(OccupiedEntry<K, V>, K), // Simply add one more element to Occupied.
    Vacant(VacantEntry<K, V>),
}

impl<K, V> Entry<K, V> {
    fn new(key: K, really: bool) -> Entry<K, V> {
        if really {
            Entry::Occupied(OccupiedEntry, key)
        } else {
            Entry::Vacant(VacantEntry{entry: key})
        }
    }
}

fn main() {
    let mut acc: Vec<String> = vec!();

    let mut word: String = Default::default();
    for i in "Lorem iipsum and all that".split_str(" ") {
        word.clear();
        word.push_str(i);

        word = match Entry::<String, i32>::new(word, i.len() % 2 == 0) {
            Entry::Occupied(_, key) => { key }
            Entry::Vacant(v) => { acc.push(v.entry); Default::default() }
        };

        println!("{}\t{}", word, i);
    }

    println!("{:?}", acc);
}
@Gankra
Copy link
Contributor

Gankra commented Jan 21, 2015

I still agree with this design, but there's some concern that it is optimizing for the general case, and not the common case.

@edwardw
Copy link

edwardw commented Jan 26, 2015

Another common scenario would be to insert an entry if absent and return an immutable reference to the value. Currently OccupiedEntry::into_mut and VacantEntry::insert all return a mutable reference, and what OccupiedEntry::get returns can't live longer than the match arm.

Of course, it is easy enough to convert a mutable reference to an immutable one:

fn get_cache_entry(&self, key: &CacheKey) -> &CacheValue {
    match self.cache.entry(key) {
        Entry::Occupied(e) => &*e.into_mut(),
        Entry::Vacant(e) => {
            let cache_value = CacheValue { ... };
            &*e.insert(cache_value)
        }
    }
}

But I'd argue that this is too common a use case to ignore.

@Gankra
Copy link
Contributor

Gankra commented Jan 30, 2015

So I've thought about this more. I can't think of any example where you'd "want the key back" that wouldn't be equivalent to "I want to clone on insert". I suppose clone-on-insert incurs one extra clone (the last one is potentially unnecessary), but it's unlikely that you'd be able to do avoid this in practice. Also clone-on-insert will likely be more memory effecient, as exactly the desired capacity for the elements in the Vec/String will be used, as opposed to just pushing in the buffer as-is.

That brings us back to the IntoOwned/IntoCow design that we tried to land awhile back but had to revert because the by-value case is too important. As such I'm reasonably confident that it will be satisfactory when we have the design where you'll be able to lookup with either a reference or a value, and if you pass in a reference it will be cloned on insertion.

@matthieu-m does that sound good to you?

CC @aturon

@matthieu-m
Copy link

The IntoOwned would indeed work for most cases I think.

It would perform exactly 1 more memory allocation IF the buffer is consumed in the last iteration while having the same number of allocations in all other cases. Furthermore, as you mention, it would allow tailoring the allocation of the keys to exactly what is needed and have a single (largish but short-lived) buffer.

So I agree with you that IntoOwned sounds like a preferable option if we can make it work.

@matthieu-m
Copy link

@gankro actually, I am wondering if IntoOwned is going to be that easy.

For efficiency, the slice should only become a String (with IntoOwned) if insertion is necessary. At the extreme, it means Vacant should contain a slice, though we might argue that if you use the Entry API then you wanted to insert if it is missing (otherwise you would just use a search) and therefore it would be okay for Vacant to contain the String already. In any case, though, if the Entry API returns a Occupied then no String should have been allocated.

Do we agree up until there?

Now, if we agree on this, how do we guarantee the coherence of the hashing operation? The simplest thing would be to transform the slice to a String with IntoOwned and hash the String, but then you might as well let the caller perform the transformation (as today).

Another alternative would be to actually do the reverse operation, transform the String into a slice before hashing (Deref?), however the benefit of IntoOwned is that any type that implements IntoOwned can be used efficiently, so we cannot so constrain the HashMap.

In the end, it seems that with the IntoOwned we will have to rely on the caller guaranteeing that both slice and the String it transforms into produce the same hash value (and can be compared with ==, but we can ask for a Eq bound here). We could double-check it by hashing the String once we have it built and panic in case of issue, but again that's slightly wasteful (maybe only in Debug) and of course it comes "too late".


On the other hand, the idea of handing the key back might not be as general (forces the user to handle the buffer on her own), but at least it does not have hash coherence issues.

@Gankra
Copy link
Contributor

Gankra commented Jan 31, 2015

IntoOwned implies BorrowFrom, which we already pervasively assume implies Hash/Eq/Ord are equivalent for.

@petrochenkov petrochenkov added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jan 28, 2018
@DanteMarshal
Copy link

is there any news on this ?

Another use-case that I stumbled upon is implementing an add method on a wrapping type that returns a Result<&mut V, SomeError<K, V>>, where the Err variant includes some error that owns the Key and the Value that were moved during the call to the add method; basically "Rejecting" the input and giving the ownership back to the user if the entry is not vacant.

@Ceffa93
Copy link

Ceffa93 commented Sep 9, 2023

As such I'm reasonably confident that it will be satisfactory when we have the design where you'll be able to lookup with either a reference or a value, and if you pass in a reference it will be cloned on insertion.

Is this, or a different solution still on the roadmap?

@kennytm
Copy link
Member

kennytm commented Feb 7, 2024

(This should be moved to ACP, if still valid?)

However, in the case the key slot in the container is already occupied, and we get an OccupiedEntry as a result, there seems to be little reason not to return the key to the user.

If you want to get back an owned key originally from the collection there is now OccupiedEntry::remove_entry(), replace_entry() and replace_key() depending on the use case.

There is no method to get back the owned key provided to the .entry() function though.

For HashMap, if you need to retain the key's ownership the current solution is to use .raw_entry_mut() (rust-lang/rust#56167) like,

match map.raw_entry_mut().from_key(&key) {
    RawEntryMut::Occupied(o) => {
        // we have owned access to the provided `key` here. do whatever you like.
        drop(key);
        ..
    },
    RawEntryMut::Vacant(v) => ..,
};

(The .raw_entry() API is very unstable and may get replaced by something else like .entry_ref() in the future. It does not exist on BTreeMap yet.)

For completeness, one could actually also improve VacantEntry so that is possible for it to hand over the key it had consumed

We now have VacantEntry::into_key().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

8 participants