-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bindnode unions are problematic for use as map keys #234
Comments
I might surprise you with my take here: I don't think we should do this. I agree that avoiding pointers can be a good thing. Be it for memory locality, or to avoid overhead, or to make equality follow the data. But I disagree that we need to support using Go types as map keys in bindnode. Two of the most common data model kinds, Bytes and List, will simply not work as map keys since they are slices. Trying to work around that will be very hacky and painful to maintain. I have a similar opinion on unions. We could support multiple ways to represent them in Go, but I ultimately do not think it's a good idea.
As an aside thought, we could probably support some form of opt-in "memoization" or "reuse Go values", akin to json's golang/go#32779. Since IPLD nodes should be immutable when in memory as Go values, it wouldn't even be painful to use. We could even consider turning this on by default if we could figure out some form of sane global defaults. Beyond the memoization idea above, here are two other suggestions:
|
re-avowing motivationThis is a hard requirement to me. I use union types in map keys all the time when specifying new protocols. So I really need it to work. And the current status quo of "you must reuse the keys you got from iterating" is pretty nonviable. Sometimes the application logic doesn't permit that as an option at all. Even when it does, the UX is nasty: I spent a long time thinking our bindnode implementation was completely broken when I first tried to do use union types in a map key... until I realized that it was just doing a semantically wrong thing in golang when I created new values and expected to be able to look them up. I think we'd be guiding pained users through that quite often if we don't support a better way to work with this. don't worry, it's not a portal to infinite madnessMind: this isn't saying that anything and everything needs to be valid as a map key; there's still limits. From the IPLD Schema side: the requirement is: to be used as a map key, a type must have a representation kind of string. (So: a structs with stringjoin repr is fair game; a union with stringprefix repr is fair game; but lists are not, and bytes are not, and structs with map repr or tuple repr are not.) "Coincidentally", those requirements mostly line up very nicely with what golang map keys can be. (Golang is mostly more permissive: e.g. any struct is fair game in golang; the range of structs IPLD Schemas will allow as map keys is much narrower.) Right now, the one exception to this is the unions, because of this issue with pointers. Okay, you have several ideas, sweet 🙇 . I will number them and make notes: approach 1: support unions with a discriminant int field and no pointers(this is my original proposal, just numbering it and re-listing it for all-in-one reading.) approach 2: memoization?Memoization can be an interesting approach but I'm tepid, bordering on cold, on that working well here. I remember building an application in java once that would only work correctly if you remembered to use approach 3: use dereferenced inhabitant as key?Using the union's actual inhabitant value as the map key. Okay, that's a cool idea. Except:
approach: use the encoded form as key?This I think is totally an option, yeah. The end user could just have a golang We might need to add more functions to We'd also lose any ability to make sure values being put in that map are actually correct. E.g. with this approach, one could have a Worth considering. But my initial thought is it might be more work and yet have less guard rails than Approach 1. tl;dr I'm open to other ideas here, but I still think supporting a shape of golang structures for unions that don't use pointers is probably the clearest path to victory. |
Okay not to nag but I want to log that I just hit this again while being a user and wanting to poke around inside a value in my application to debug it. I can't quickly type out a value for the key -> ow, debugging something beyond that value is agony. My empirically observed halflife between pain events on this is... low. |
I mulled over this last night, and I agree with your conclusion. My points boiled down to:
We agree on 1 and 2, I think. On 3 I think it's more subjective. I think using a codec with As for the implementation: it should be as simple as recovering the code we had before, and adding a branch in every one of those places. Something like: If the first field is called One if condition at each of those places should be basically free, because either mechanism always has to at least look at the first field already. |
Yep, that analysis seems correct. ... huh. I had been about to say "maybe it's time to do more up-front validation on the golang type shapes, and memoize it", but I guess I don't have a performance argument here to support that position. Neat. |
I thing I considered, and want to log, but seems to be a nonissue: I felt a little tetchy at first about a rule like "first field is called But I guess that's technically not a problem, because the field names in golang don't matter at all in how we've implemented unions in bindnode (just their offsets in the struct do). So, okay! |
I thought about that collision too, but it's actually impossible if we require the non-indexed version to use pointers. Because then, the first valid field would have to be And yeah, we're not doing any upfront preparation and memoization of reflect work right now. We probably will end up doing some of it later on, as part of optimizations, kinda like what json does for its marshalers. For now, the performance seems to be good enough, and this particular case doesn't make it any worse :) |
Here's another reason to do this: sometimes the pointers add unnecessary overhead of their own. For example:
Using pointers to basic or tiny Go types is most likely a bad idea. An Index integer plus non-pointer values would use basically the same amount of memory, and avoid jumping around memory with pointers. |
tl;dr: Bindnode should have some way to support unions that does not require the use of pointers. Currently this is not the case. This is important for map keys.
It's desirable that when writing their own golang types for use with
bindnode
, people should be able to avoid use of pointers whenever possible, and in particular this is quite important if they want to use a golang type as a map key.To be useful as a map key, the comparison operation on a value must be useful. The comparison operation for pointers is... pointer equality. This is very often not useful in a map key. It means it's very hard to create a new value that will be equal to an existing value, even if you have all of the same constituent data. As long as you hold on to literally the the values originally used as keys, you can use them to look things up again; that's the only option you have. (And this can be doubly problematic because it's not super obvious if you make mistakes about this.)
In particular we can illustrate this clearly with a union that we want to use as a map key:
Right now this will require the user to write a golang type for the union that contains pointers:
... and, well, you can probably see how that goes. If one parses the string
"foo:woo"
into a&FooOrBar
twice, you'll get two different values: they'll both have theFoo
field set, and if you deference that pointer, it'll be the same string... but you'll have different pointers in the middle. And that'll foobar your equality for purposes like a golang map key.We can't compensate for this by adding our own equality functions in IPLD. It's a golang native map behavior we're talking about here; it can't be patched over from the outside.
I think the solution here will be: We'll have to reintroduce the ability to use an integer field as the in-memory discriminator, and allow non-pointer fields for the members.
I think we can make this optional and just engage it if we find such a field. But we'll have to carefully define exactly what "find such a field" means.
And then it'll still probably also remain a good idea to document some warnings about this, because it'll be in the court of the person authoring the go types to not put themselves in the problematic situation. (The logical errors you'll eventually get from golang itself are not helpful, speaking from experience.)
The text was updated successfully, but these errors were encountered: