#254, fix space leak on collisions #258

ndmitchell · 2020-05-31T22:27:21Z

Fixes #254. That benchmark used to leak 1Gb/s, now it doesn't leak and runs in 5Mb.

I added a test case. The invariants to ensure GHC doesn't hoist the memory out (and thus keep it alive) are quite subtle, but the result should be fairly robust. I confirm the test fails before, and passes after. Unfortunately, it's not as simple as testing for strictness, since the functions have the right strictness behaviour, it's just old strict values get retained longer than is necessary.

sjakobi

Many, many thanks for the fix and the amazing regression tests! :)

I'm quite out of my depth here, so I have a lot of questions. Please don't put too much effort into addressing them, though. @treeowl's review will be much more important.

tests/Regressions.hs

sjakobi · 2020-05-31T23:28:43Z

Data/HashMap/Base.hs

                 -> A.Array (Leaf k v)
 updateOrSnocWith f = updateOrSnocWithKey (const f)
 {-# INLINABLE updateOrSnocWith #-}

-updateOrSnocWithKey :: Eq k => (k -> v -> v -> v) -> k -> v -> A.Array (Leaf k v)
+updateOrSnocWithKey :: Eq k => (k -> v -> v -> (# v #)) -> k -> v -> A.Array (Leaf k v)


Maybe add a reference to David's explanation of the unboxed-tuple trick here?

We use the unboxed tuple trick in a bunch of places in HashMap, so it seems "expected knowledge" for working with this package.

Perhaps we should raise a general doc issue for this trick then. It's uncommon knowledge I think 😄

I think a blog post explaining the trick would be great. I wouldn't document it in the context of unordered-containers, but try and make it a general technique, and just link to it from this project.

Yeah, a blog post would be great. I have opened #262 as a reminder.

I would call this updateOrSnocWithKey#, and define updateOrSnocWithKey to use it in the boring way. As it is, this function can only be used for the lazy operations. Should we extend it to let it work for strict ones too? The most semantically pleasing version would look like this:

updateOrSnocWith# :: Eq k => ((# (# #) | v #) -> (# v #)) -> k -> A.Array (Leaf k v) -> A.Array (Leaf k v) -> A.Array (Leaf k v)

This takes a function that we pass either (# (# #) | #) (meaning the key was not present) or (# | v #) (the value tied to the key in the map) and produces an unboxed unary tuple for the new value.

Unfortunately, we support GHC versions that don't have unboxed sums. We can work around that in one of two ways:

updateOrSnocWith# :: Eq k => (Maybe v -> (# v #)) -> k -> A.Array (Leaf k v) -> A.Array (Leaf k v) -> A.Array (Leaf k v)

updateOrSnocWith# :: Eq k => (v -> (# v #)) -> k -> ((# #) -> (# v #)) -> A.Array (Leaf k v) -> A.Array (Leaf k v) -> A.Array (Leaf k v)

Another issue is evaluating the whole v -> v vs. v -> v -> v sort of approach, which has to do with what closures we might have to create in different contexts. This will come down to poring over Core and doing some benchmarking. This is actually true whether we try to unify strict and lazy or not. Personally, I'd love to be able to unify as much as we can, if we can get enough of the right things to inline the way we want to avoid hurting performance.

Final note: if we're careful enough about inlining, we might well be able to just use boxed everything and still not pay for it. That means making sure that the underlying implementation inlines into the Strict or Lazy version as soon as the wrapped-up function is passed in, while that Strict or Lazy version itself remains (at least) INLINABLE.

Is it useful to add both updateOrSnocWith# and updateOrSnocWith? For user code, where we expose it, I agree that having a "friendly" wrapper is valuable. But given we have updateOrSnocWith and updateOrSnocWithKey that seems like a lot of intermediate helper functions that don't get used a whole lot.

My inclination would be to keep this patch (which fixes a space leak) a small self-contained space leak fix, and then follow up with the larger unboxed sums and looking at the Core after. It feels like a subsequent refactor, and one that I'm not well placed to make.

tests/Regressions.hs

ndmitchell · 2020-06-01T11:30:02Z

Lots more documentation about the test based on the review by @sjakobi. I also made the lazy test use undefined values to insert, checking that you don't have the space leak but still aren't strict.

treeowl · 2020-06-01T11:41:50Z

Will look today.

…

On Mon, Jun 1, 2020, 7:30 AM Neil Mitchell ***@***.***> wrote: Lots more documentation about the test based on the review by @sjakobi <https://github.com/sjakobi>. I also made the lazy test use undefined values to insert, checking that you don't have the space leak but still aren't strict. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#258 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7ORSB7XYZGPLFURN4DRUOGMPANCNFSM4NPLCURQ> .

emilypi · 2020-06-01T18:09:45Z

So, I'm 👍 on this - I can't find anything in particular to gripe about. If @treeowl sees something to fix, i think he should be the final approval here.

sjakobi · 2020-06-01T19:08:33Z

Data/HashMap/Base.hs

@@ -1949,7 +1949,7 @@ updateOrSnocWithKey f k0 v0 ary0 = go k0 v0 ary0 0 (A.length ary0)
            A.write mary n (L k v)
            return mary
        | otherwise = case A.index ary i of
-            (L kx y) | k == kx   -> A.update ary i (L k (f k v y))
+            (L kx y) | k == kx, (# v2 #) <- f k v y -> A.update ary i (L k v2)


I'm still a bit mystified how the fix works, but this must be the crucial line.

By forcing the unboxed tuple, we can get rid of the reference to the old value y, so it can be GC'd. Is that right?

This line does the forcing, so it's half the story. Line 760 constructs \a _ -> (# a #) which says how far to evaluate (at least as far as that the _ has disappeared). Together they get rid of the old value, that previously hung out in the _, and is now not in the (# #) value.

Speaking of this line, that case-plus guard can be rewritten

| L kx y <- A.index ary i = if | k == kx , (# v2 #) <- f k v y -> A.update ary i (L k v2) | otherwise -> go k v ary (i+1) n

Personally, I think this seems a little easier to follow.

I refactored to morally follow that pattern - the nested if is not actually needed as the kx and y bound by A.index aren't used (and importantly, shouldn't be used) in the go recursive call - so I hoisted it all up a bit.

treeowl · 2020-06-01T21:25:54Z

tests/Regressions.hs

+--
+-- To make the test robust, it's important that oldV isn't hoisted up to the top or shared. To do that,
+-- we use NOINLINE, make oldV dependent on an unseen argument, and insert _ <- return () to ensure oldV
+-- is under a lambda.


Yuck yuck yuck. I don't like this approach to avoiding floating. It seems much too fragile. Fortunately, I think we can do something both simpler and more robust, taking advantage of the surprising fact that type Assertion = IO ().

issue254Lazy :: Assertion issue254Lazy = do i :: Int <- randomIO let oldV = error $ "...." ++ show i ....

Now GHC has no way to float out the old value because it's genuinely unknown until the assertion is run.

That's way better. I've done that.

treeowl · 2020-06-01T21:31:39Z

tests/Regressions.hs

+  let oldV = error $ "Should not be evaluated: " ++ show i
+  weakV <- mkWeakPtr oldV Nothing -- test whether oldV is alive
+  let mp = HML.insert (KC 1) (error "Should not be evaluated") $ HML.fromList [(KC 0, "1"), (KC 1, oldV)]
+  _ <- evaluate mp -- force the insert to happen


This should probably be written

mp <- evaluate $ HML.insert (KC 1) (error "Should not be evaluated") $ HML.fromList [(KC 0, "1"), (KC 1, oldV)]

tests/Regressions.hs

treeowl · 2020-06-01T22:20:51Z

I have no idea why touch is not available in System.Mem.Weak. That seems rather peculiar. Hopefully the new with will be exposed there at some point.

sjakobi

One more nit. Apart from that, LGTM! 👍

unordered-containers.cabal

Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>

treeowl · 2020-06-02T01:16:25Z

Thanks!

treeowl · 2020-06-03T19:29:25Z

I want to add extra special thanks for the testing work. That was all sorts of tricky and subtle.

ndmitchell added 2 commits May 31, 2020 21:58

haskell-unordered-containers#254, avoid space leak with collisions

647f6af

haskell-unordered-containers#254, add a regression test

d1d0151

ndmitchell mentioned this pull request May 31, 2020

Space leak with collisions #254

Closed

sjakobi added this to the 0.2.11.0 milestone May 31, 2020

sjakobi reviewed May 31, 2020

View reviewed changes

ndmitchell added 2 commits June 1, 2020 12:25

Improve the documentation on the regression test

54bcfdd

More inline comments on the space leak regression test

dea714a

sjakobi mentioned this pull request Jun 1, 2020

Properly document the unboxed unary tuple trick #262

Open

sjakobi reviewed Jun 1, 2020

View reviewed changes

treeowl requested changes Jun 1, 2020

View reviewed changes

ndmitchell added 2 commits June 1, 2020 23:10

Add random package as a dependency

208e450

Update Regressions.hs

6ebe947

Refactor the updateOrSnocWithKey otherwise case

b6ed146

treeowl approved these changes Jun 1, 2020

View reviewed changes

sjakobi approved these changes Jun 1, 2020

View reviewed changes

unordered-containers.cabal Outdated Show resolved Hide resolved

Alphabetise the package dependencies

c8ff7b4

Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>

emilypi approved these changes Jun 1, 2020

View reviewed changes

treeowl merged commit c4c671f into haskell-unordered-containers:master Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#254, fix space leak on collisions #258

#254, fix space leak on collisions #258

ndmitchell commented May 31, 2020

sjakobi left a comment

sjakobi May 31, 2020

ndmitchell Jun 1, 2020

emilypi Jun 1, 2020

ndmitchell Jun 1, 2020

sjakobi Jun 1, 2020

treeowl Jun 1, 2020

ndmitchell Jun 1, 2020

ndmitchell commented Jun 1, 2020

treeowl commented Jun 1, 2020 via email

emilypi commented Jun 1, 2020

sjakobi Jun 1, 2020

ndmitchell Jun 1, 2020

treeowl Jun 1, 2020

ndmitchell Jun 1, 2020

treeowl Jun 1, 2020

ndmitchell Jun 1, 2020

treeowl Jun 1, 2020

ndmitchell Jun 1, 2020

treeowl commented Jun 1, 2020

sjakobi left a comment

treeowl commented Jun 2, 2020

treeowl commented Jun 3, 2020

#254, fix space leak on collisions #258

#254, fix space leak on collisions #258

Conversation

ndmitchell commented May 31, 2020

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndmitchell commented Jun 1, 2020

treeowl commented Jun 1, 2020 via email

emilypi commented Jun 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

treeowl commented Jun 1, 2020

sjakobi left a comment

Choose a reason for hiding this comment

treeowl commented Jun 2, 2020

treeowl commented Jun 3, 2020