Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce 'quip' data building helpers. #134

Merged
merged 3 commits into from
Jan 8, 2021
Merged

Introduce 'quip' data building helpers. #134

merged 3 commits into from
Jan 8, 2021

Conversation

warpfork
Copy link
Collaborator

@warpfork warpfork commented Jan 3, 2021

I developed a new pattern for building IPLD structures. This started as a little experiment, but worked so dang well I instantly want to ship it.

The basic idea is that every function in the package takes an *error as its first parameter, and gathers any errors during the function's operation in there, instead of returning them. If the *error is already non-nil, it just returns without doing anything. This lets you write very linear code without a lot of fuss, and tends to "do the right thing" in over all flow control.

A function that's the equivalent of each of the methods on NodeAssembler exists. (I've also sprinkled in a few other useful ones like "CopyRange" for lists.) These generally take a NodeAssembler to work on as a parameter. For those functions that work on recursive kinds, I've done callback arguments, which causes tree structures to syntactically indent like trees (and also saves you the calling and error checking of Finish() functions), which I tend to think is good.

Check out the example function around https://github.com/ipld/go-ipld-prime/pull/134/files#diff-46ca8eb0f742995be52debd745cb616db55f094a46c9182058da8b1d3b548ed9R15-R42 for a demonstration of how using these helpers looks.

Syntactically, this comes out a lot shorter than using the NodeAssembler manually and doing all the error checks in longhand. It's not quite as short as some of our other syntaxes (yet?) either, but it's most of the way there, and there's still room for improvement. Here's the linecounts summary, based on the examples that can be found in the benchmarks in this diff:

  • Quip: ~ 18 SLOC update: now down to ~12 SLOC
  • Fluent Builders: ~12 SLOC
  • Fluent Reflect: ~12 SLOC
  • Unmarshal: ~12 SLOC
  • (agonizingly) Bare: ~66 SLOC

(I think we can get this 18 down to the 12 of the others; there's some comments in the diff about avenues to explore. And then 12 is about as far as I'd push it on this example, apparently (on this example data, any less implies you're collapsing more than one tree structure element into each line of source, which is of dubious desirability for readability). update: Indeed we did get it down to 12!)

("unmarshal" is arguably an odd thing to include in this comparison table, but I thought to myself: well, okay, what if we take that as an approximation for "how good could we get if we invented a whole DSL for this?". It's just interesting to ponder.)

The really good news about all this? It's also working with zero overhead. And I mean immeasurable. Here's a sample benchmark result:

BenchmarkQuip-12                               946 ns/op      880 B/op     17 allocs/op
BenchmarkUnmarshal-12                         6435 ns/op     2208 B/op     52 allocs/op
BenchmarkFluent-12                            1361 ns/op     1136 B/op     31 allocs/op
BenchmarkReflect-12                           1257 ns/op     1008 B/op     20 allocs/op
BenchmarkReflectIncludingInitialization-12    1558 ns/op     1440 B/op     25 allocs/op
BenchmarkAgonizinglyBare-12                    981 ns/op      944 B/op     19 allocs/op

Using the quip helpers is literally free. In fact, evidently, it's so amenable to compiler inlining and other compile-time optimizations that it's actually faster than the longhand direct usage of NodeAssembler. (Borderline hard to believe, I know, but, I think it's not even a sampling artifact, and the benchmark membench results also support this. I suspect the extra nested functions in the quip style are letting the escape analyzer get quite clever and shift more things onto the stack, whereas my longhand function held onto more assemblers at once, in... ways that were harder to optimize. I dunno. Haven't looked that gift horse in the mouth too deeply, honestly.)

By comparison, our other previous attempts at helper functions for assembling IPLD data had a 30% cpu time tax, and something like a 50% garbage tax. Those taxes meant you really couldn't have nice syntax and ideal performance at the same time... and that meant, in practice, a lot of code would get written twice: once the terse way, and a second time the fast way as soon as it showed up on an application level benchmark. Having to decide which of two styles to use based on whether you estimated you could stomach the performance hit was just an awful choice to have to make. With quip, that choice is gone: you can have nice syntax and fast at the same time.

I've skimmed a little of the assembler output to see why its so fast. It seems that quip functions are generally inlinable. I suspect this means the "if err" branches can often be unrolled into one jump target per function in assembly. Also importantly, the anonymous functions used to make the syntactic tree structures (and save us from needing to make Finish() calls) get compiled into anonymous functions without closure state -- which is important because: it means they're not dynamically dispatched; and they don't do a sneaky allocation to create an anonymous struct that holds closure state (which can be one of the sources of allocs that can be the most syntactically difficult to track down). I suspect it's possible to write more complicated callbacks which would cause the closure alloc cost, but... if the most common usage doesn't, I'm happy enough with that.

Worth noting that of the remaining allocations shown in the benchmark... most of them are probably from the fact we're using basicnode nodes to hold this data. basicnode implementations are decidedly allocation-happy since they don't know anything about the structure of the data they're holding. If we implemented another benchmark against a codegen'd node that does know about the structure of the data, I suspect the alloc count would be in the single digits. Might even be able to get it down to one or two (which is about as low as I can imagine for a corpus like this one that involves two variable sized lists).

There's probably more work to do here:

  • More helpers (especially around leaf values; see comments in diff) could make this even terser. update: There are now more helpers for scalar and leaf values.
  • "Quip" is probably a working title for the package.
  • We can always find more high-level helpers like CopyRange to keep adding, I'm sure!

... but I think this is a pretty solid foundation to build more on.

@rvagg
Copy link
Member

rvagg commented Jan 4, 2021

Yeah, that's a nice pattern - opt-in error checking. I suppose it does impose an additional burden on the API consumer to ensure that success-dependent logic doesn't get invoked if there's an error part-way, but that can be clarified with documentation and a big 'ol caveat emptor I suppose.

…rors correctly.

If there's any furhter action to take after the callback, it's
necessary to recheck the error slot before taking that action;
and of course we definitely don't want to overwrite the error if one
had been set during the callback.

I wonder if it would ever be useful to have a variant of these
functions which *does* attempt to call `Finish`, etc, and thus still
build a partial tree at the end even if it stopped on error midway.
(Right now, if something stops midway, the final `Build` call will
panic, and tell you you haven't finished all the assemblers.)
It would be a bit more complicated though: we'd potentially need to
accumulate more than one error.  And in practice, when working on
schema'd data, it would often still result in invalid results
(anything other than optional fields, or type-level maps and lists,
will fail some other logical rule if not fully filled in), which
makes me wonder how often this would be useful.
Copy link
Contributor

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this at length in chat - I personally think this is better than fluent, but there's still room to improve. I have an altered version in a branch at https://github.com/ipld/go-ipld-prime/tree/quip-mvdan.

We've agreed to merge both as separate packages, sort of like how quip will sit next to fluent even though both serve the same purpose. Longer term, we'll see which one we like best, and delete the others.

Once this is merged, I can post my PR and add my API to the benchmarks.

@mvdan
Copy link
Contributor

mvdan commented Jan 7, 2021

To add a bit more context - what I'm not a fan of in this API is:

  • All the error pointers, and having to use them correctly.
  • AbsorbError seems like an extension of the previous problem.
  • There are quite a lot of function literals even in the simple examples, leading to extra verbosity and indentation.

Lots of individual things:

- removed the "Begin*" functions; no need to expose that kind of raw
  operation when the callback forms are zero-cost.

- renamed functions for consistent "Build" vs "Assemble".

- added "Assign*" functions for all scalar kinds, which reduces the
  usage of "AbsorbError" (but also, left AbsorbError in).

- renamed the ListEntry/MapEntry functions to also have "Assemble*"
  forms (still callback style).

- while also adding Assign{Map|List}Entry{Kind} functions (lets you
  get rid of another callback whenever the value is a scalar).

- added Assign{|MapEntry|ListEntry} functions, which shell out to
  fluent.Reflect for even more convenience (at the cost of performance).

- moved higher level functions like CopyRange to a separate file.

- new benchmark, for the terser form of working with scalars.
  (It's also evidently slightly faster, because fewer small function
  calls.  Slightly surprising considering how much inlining we might
  expect, but, huh.  Alright, surprise bonus; acceptable.)

- example function updated to use the terser form.

With these terseness improvements to handling of scalars, the overall
SLOC count for using the quip system is now exactly on par with fluent.

Varations on map key arguments (which could be PathSegment or even
Node, in addition to string) still aren't made available this.
Perhaps that's just okay.  If you're really up some sort of creek
where you need that, you can still just use the
MapAssembler.AssembleKey system directly, which can do everything.
@warpfork
Copy link
Collaborator Author

warpfork commented Jan 8, 2021

I've finished adding the rest of the helper methods that seem most important. (This includes making handling of scalar and leaf values take one function call instead of two, and no extra callbacks -- which saves a lot of keystrokes, and is a double boon because one of those two calls used to be that rather ugly general-purpose error gathering function.) I've also done a naming consistency pass. There are also now functions for working with NodeAssembler, but also a few for top-level usage that deal with the NewBuilder and Build dance for you, which shaves a few more lines off common usages.

The extra methods for handling scalars probably is the biggest impact. The overall example usage is now this:

n := quip.BuildMap(&err, basicnode.Prototype.Any, 4, func(ma ipld.MapAssembler) {
	quip.AssignMapEntryString(&err, ma, "some key", "some value")
	quip.AssignMapEntryString(&err, ma, "another key", "another value")
	quip.AssembleMapEntry(&err, ma, "nested map", func(na ipld.NodeAssembler) {
		quip.AssembleMap(&err, na, 2, func(ma ipld.MapAssembler) {
			quip.AssignMapEntryString(&err, ma, "deeper entries", "deeper values")
			quip.AssignMapEntryString(&err, ma, "more deeper entries", "more deeper values")
		})
	})
	quip.AssembleMapEntry(&err, ma, "nested list", func(na ipld.NodeAssembler) {
		quip.AssembleList(&err, na, 2, func(la ipld.ListAssembler) {
			quip.AssignListEntryInt(&err, la, 1)
			quip.AssignListEntryInt(&err, la, 2)
		})
	})
})

This still isn't perfect -- lots of &err isn't exactly pretty, and the keystroke count is generally high (it's not bad if you have good editor autocomplete, but it's definitely no fun if you don't) -- but it's relatively satisfactory.

Like @mvdan said, we're also still going to pursue other approaches and experiments in parallel. Hopefully something shakes out as the most preferred idiom after some time of usage, but there's no need to rush that process. As long as things begin and end in Node and NodeAssembler, we can try lots of approaches in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants