Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Added classes for values and interned keys (for both C++ and C#), and the Common project stub. #99

Merged
merged 3 commits into from
Nov 13, 2019

Conversation

ivte-ms
Copy link
Contributor

@ivte-ms ivte-ms commented Nov 12, 2019

This is an attempt to split a large change into independent reviews, so please ignore the stubs inside the existing stub interfaces (could delete them here, but it's probably better to replace them in the following reviews, so both Before and After are visible).
The interning mechanism is very temporary (it's just a bunch of maps with locks), but the quality of it shouldn't affect the interfaces, so treat is as a placeholder for the real one.


template <class To, class From>
MS_MR_SHARING_FORCEINLINE
typename std::enable_if_t<(sizeof(To) <= sizeof(From)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function used? I have to imagine most of the time sizeof(PtrTo) > sizeof(EnumFrom)

Copy link
Contributor Author

@ivte-ms ivte-ms Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actively (not a part of this review though).

Enums like KeyHandle and ValueHandle have a constant size (64 bit) regardless of the platform (they are not assuming that they are pointers). So each time I'm converting between enums and pointers, I'm using one of the two custom functions (bit_cast would work correctly only on x64; but on x32 sizes won't match).

I'm not sure if we are still operating under the assumption that VersionedStorage can be used for other purposes (outside of StateSync with its specific format for keys and values). If we'll drop this assumption and start working with keys and values directly, these two functions could be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. 64 bit enums pretty uncommon, perhaps enum64_to_pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds good. If we'll ever get size_t enums, ordinary bit_cast can be used, so this one is likely to always be 64-bit-specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized after the rebase that the fix is incorrect. Checking for (sizeof(...) == 64) instead of (sizeof(...) == 8). Whoops.

#pragma warning(disable : 4200)
#endif // _MSC_VER

char data_[]; // This is a language extension that is expected to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the portability loss worth it vs (char*)(this+1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here for debuggability (the data is visible without any natvis tricks, and we can't write natvis for all platforms).
GCC/Clang have this since forever, and on recent CPPCon it was mentioned that it will probably be in C++23. If we'll have any issues with this, it should be easy to fix.

struct KeyPtrEqual {
MS_MR_SHARING_FORCEINLINE
size_t operator()(const Key* a, const Key* b) const noexcept {
return a == b || a->size() == b->size() && a->hash() == b->hash() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parens please!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some other mixed use of &&,|| whichI didn't explicitly flag BTW

Comment on lines +124 to +146
bool Key::OrderedLess(const Key& other) const noexcept {
if (this == &other)
return false;

// Sizes are compared first.
// This ordering is likely to be faster than lexicographical.
return size_ == other.size_ ? memcmp(data_, other.data_, size_) < 0
: size_ < other.size_;
}

bool Key::OrderedLess(std::string_view sv) const noexcept {
// Sizes are compared first.
// This ordering is likely to be faster than lexicographical.
return size_ == sv.size() ? memcmp(data_, sv.data(), size_) < 0
: size_ < sv.size();
}

bool Key::OrderedGreater(std::string_view sv) const noexcept {
// Sizes are compared first.
// This ordering is likely to be faster than lexicographical.
return size_ == sv.size() ? memcmp(data_, sv.data(), size_) > 0
: size_ > sv.size();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work way better if I compared the hashes first (it's generally worth computing them for string_view in cases where I'm using this code anyway, so I could introduce StringViewWithHash or something), but then we would depend on a specific hash function (and if we decided to replace it for some reason, it would change the iteration order for keys in all existing states).

Would appreciate any feedback here. Technically, we could as well just enforce the lexicographical ordering (which would make insertions slower if the users are using keys with different lengths).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk IRL about this. Leaking the specific hash choice seems like a bad idea.
Is this function likely to be very hot? If we had a minimum key length, then Key::OrderedLess(const Key& other) could get a really nice fastpath, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comparators are used in two and a half places:

  1. When I'm inserting keys into the storage (there is an AA-tree of them), so each ApplyTransaction that adds something is going to use it. Whether we'll have a situation with frequent key inserts/removes depends on what users are doing (it's a bit outside of our control), but this is probably not bad.

  2. When I'm building the transaction. Currently it has a std::map to store affected keys. I sort of want a unique canonical representation for transactions, because it makes merging easier (if I have to reallocate the storage to apply the transaction), and this property can be used later (transaction_hash512 or something). I could just build them in unordered fashion, but there is more:

amost-3-but-not-quite) Patches for prediction. This functionality is missing from the current implementation, but we discussed it many times, and I still have code drafts somewhere. Short version: if we think we'll apply the transaction, but didn't quite do that yet, we may want a special view object that operates on a snapshot and what-we-think-will-be-applied. In this case iterating over keys is a merge of two iterations (over keys of the snapshot and keys of the patch). Some sort of sorting helps a lot there.

It feels like it's better to maintain the ordering of transactions then, so the answer is "every time you modify anything, this will be called if there is more than 1 affected key".

Currently we don't have any restrictions other than "less than ~2Gb so that C# is happy", and that one is not even enforced yet.

Not sure how minimum key length helps. Could you elaborate please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to digest that block later. Meanwhile, the min key length is just so part of the memcmp can be inlined. Not as good as a hash obviously, but perhaps can be a micro-win.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would likely increase the number of keys for which the sizes are the same, though, plus will introduce an oddly specific requirement for customers (like, if they were planning to use normal binary strings, now they have to think about padding that won't collide with larger strings). Let's discuss.

Copy link
Member

@stephenatwork stephenatwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Super clean.

@ivte-ms ivte-ms merged commit a047bf3 into microsoft:feature/dev Nov 13, 2019
@ivte-ms ivte-ms deleted the user/ivte/key-value-cs branch November 13, 2019 15:03
@ivte-ms
Copy link
Contributor Author

ivte-ms commented Nov 13, 2019

Completing now to unblock myself and enable new pull requests based on the merged result.

Would greatly appreciate any additional feedback from people who didn't comment yet, and will apply fixes in future review.


namespace Microsoft.MixedReality.Sharing.StateSync
{
public class Constants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this class static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, will do.

/// <summary>
/// The base class for C# wrappers around objects inheriting from VirtualRefCountedBase on C++ side.
/// </summary>
public class VirtualRefCountedBase : HandleOwner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be used without a sub class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not too much value in doing this, especially since you have to assign a handle to do something useful.

What would you propose here?

/// </summary>
public abstract class HandleOwner : CriticalHandle
{
protected HandleOwner() : base(IntPtr.Zero) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining how the Ptr would be set?

/// A 64-bit hash of the key.
/// </summary>
/// <remarks>
/// The hash returned by GetHashCode() is obtained from this one by casting it to int.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is wrong, as GetHashCode() is technically obtained from PInvoke_hash(); btw, I do think we should have GetHashCode go through this property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effects are equivalent, what's happening in the background is an implementation detail (that happens to be 1 call shorter)


[DllImport(PInvokeAPI.LibraryName, EntryPoint =
"Microsoft_MixedReality_Sharing_StateSync_Key_hash")]
internal static extern ulong PInvoke_hash(IntPtr handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, could we make the C# side method follow PascalCase to standardize on all PInvoke calls here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is like this for consistency with the code it actually represents (because that's what I'm essentially doing here: calling a C function).

In Chromium style methods are PascalCase, and trivial properties are snake_case().

Long C function names are trying to mimic the full name with namespaces for what they represent (in this case Microsoft::MixedReality::Sharing::StateSync::Key::hash()), and short PInvoke methods are trying to look the same way as they do on the other side.

It would be very inconventient to see some PInvokeFooBar, and then fail to find FooBar because on the C++ side it's actually foo_bar().

The prefix looks like PInvoke_ instead of PInvoke to make it clear where one word starts and another ends. I could go for an inner class (so this would look like PInvoke.hash()), but decided against it since the prefix is clear enough, and doesn't artificially inflate the number of classes.

I tried to roll with the idea of piling all PInvoke methods into one class (in your draft it's StateSyncAPI, in my attempt it's PInvokeAPI). It quickly becomes very difficult to keep track on what's going on there, so I'll probably reserve it for for stuff that is not really related to any specific class.

// Returns the pointer to the beginning of the view
[DllImport(PInvokeAPI.LibraryName, EntryPoint =
"Microsoft_MixedReality_Sharing_StateSync_Value_view")]
private static extern unsafe byte* PInvoke_view(IntPtr handle, ref int out_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, could we make it PascalCase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above.

enum class DontAddRef {};

template <typename T>
class RefPtr {
Copy link
Contributor

@andreiborodin andreiborodin Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, was there not a type in std that we could already use? And if it exists but we shouldn't use, just to understand why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no type like this. The closest equivalent to something already existing would be boost::intrusive_ptr, but it's very old and doesn't match the modern C++ conventions for names (like release()).

We do have a non-intrusive shared_ptr, which has a completely different design (for starts, that's two pointers, the other one leads to control block, which is either allocated separately or in front of the object, which prevents me from storing the payload of the key in the same allocation).

In C++20 there will be a bit more on top of that (sharing arrays is easier now), but the 2-pointers thing is not going away, because the class is non-intrusive. Everyone is writing their own intrusive pointers (webrtc has a very similar one, for example).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants