Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#254, fix space leak on collisions #258

Merged
merged 8 commits into from
Jun 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions Data/HashMap/Base.hs
Original file line number Diff line number Diff line change
Expand Up @@ -757,7 +757,7 @@ insert' h0 k0 v0 m0 = go h0 k0 v0 0 m0
else Full (update16 ary i st')
where i = index h s
go h k x s t@(Collision hy v)
| h == hy = Collision h (updateOrSnocWith const k x v)
| h == hy = Collision h (updateOrSnocWith (\a _ -> (# a #)) k x v)
| otherwise = go h k x s $ BitmapIndexed (mask hy s) (A.singleton t)
{-# INLINABLE insert' #-}

Expand Down Expand Up @@ -880,7 +880,7 @@ unsafeInsert k0 v0 m0 = runST (go h0 k0 v0 0 m0)
return t
where i = index h s
go h k x s t@(Collision hy v)
| h == hy = return $! Collision h (updateOrSnocWith const k x v)
| h == hy = return $! Collision h (updateOrSnocWith (\a _ -> (# a #)) k x v)
| otherwise = go h k x s $ BitmapIndexed (mask hy s) (A.singleton t)
{-# INLINABLE unsafeInsert #-}

Expand Down Expand Up @@ -1026,7 +1026,7 @@ unsafeInsertWith f k0 v0 m0 = runST (go h0 k0 v0 0 m0)
return t
where i = index h s
go h k x s t@(Collision hy v)
| h == hy = return $! Collision h (updateOrSnocWith f k x v)
| h == hy = return $! Collision h (updateOrSnocWith (\a b -> (# f a b #)) k x v)
| otherwise = go h k x s $ BitmapIndexed (mask hy s) (A.singleton t)
{-# INLINABLE unsafeInsertWith #-}

Expand Down Expand Up @@ -1394,10 +1394,10 @@ unionWithKey f = go 0
else collision h1 l1 l2
| otherwise = goDifferentHash s h1 h2 t1 t2
go s t1@(Leaf h1 (L k1 v1)) t2@(Collision h2 ls2)
| h1 == h2 = Collision h1 (updateOrSnocWithKey f k1 v1 ls2)
| h1 == h2 = Collision h1 (updateOrSnocWithKey (\k a b -> (# f k a b #)) k1 v1 ls2)
| otherwise = goDifferentHash s h1 h2 t1 t2
go s t1@(Collision h1 ls1) t2@(Leaf h2 (L k2 v2))
| h1 == h2 = Collision h1 (updateOrSnocWithKey (flip . f) k2 v2 ls1)
| h1 == h2 = Collision h1 (updateOrSnocWithKey (\k a b -> (# f k b a #)) k2 v2 ls1)
| otherwise = goDifferentHash s h1 h2 t1 t2
go s t1@(Collision h1 ls1) t2@(Collision h2 ls2)
| h1 == h2 = Collision h1 (updateOrConcatWithKey f ls1 ls2)
Expand Down Expand Up @@ -1932,12 +1932,12 @@ updateWith# f k0 ary0 = go k0 ary0 0 (A.length ary0)
| otherwise -> go k ary (i+1) n
{-# INLINABLE updateWith# #-}

updateOrSnocWith :: Eq k => (v -> v -> v) -> k -> v -> A.Array (Leaf k v)
updateOrSnocWith :: Eq k => (v -> v -> (# v #)) -> k -> v -> A.Array (Leaf k v)
-> A.Array (Leaf k v)
updateOrSnocWith f = updateOrSnocWithKey (const f)
{-# INLINABLE updateOrSnocWith #-}

updateOrSnocWithKey :: Eq k => (k -> v -> v -> v) -> k -> v -> A.Array (Leaf k v)
updateOrSnocWithKey :: Eq k => (k -> v -> v -> (# v #)) -> k -> v -> A.Array (Leaf k v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a reference to David's explanation of the unboxed-tuple trick here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the unboxed tuple trick in a bunch of places in HashMap, so it seems "expected knowledge" for working with this package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should raise a general doc issue for this trick then. It's uncommon knowledge I think 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a blog post explaining the trick would be great. I wouldn't document it in the context of unordered-containers, but try and make it a general technique, and just link to it from this project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a blog post would be great. I have opened #262 as a reminder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this updateOrSnocWithKey#, and define updateOrSnocWithKey to use it in the boring way. As it is, this function can only be used for the lazy operations. Should we extend it to let it work for strict ones too? The most semantically pleasing version would look like this:

updateOrSnocWith#
  :: Eq k
  => ((# (# #) | v #) -> (# v #))
  -> k
  -> A.Array (Leaf k v)
  -> A.Array (Leaf k v)
  -> A.Array (Leaf k v)

This takes a function that we pass either (# (# #) | #) (meaning the key was not present) or (# | v #) (the value tied to the key in the map) and produces an unboxed unary tuple for the new value.

Unfortunately, we support GHC versions that don't have unboxed sums. We can work around that in one of two ways:

updateOrSnocWith#
  :: Eq k
  => (Maybe v -> (# v #))
  -> k
  -> A.Array (Leaf k v)
  -> A.Array (Leaf k v)
  -> A.Array (Leaf k v)
updateOrSnocWith#
  :: Eq k
  => (v -> (# v #))
  -> k
  -> ((# #) -> (# v #))
  -> A.Array (Leaf k v)
  -> A.Array (Leaf k v)
  -> A.Array (Leaf k v)

Another issue is evaluating the whole v -> v vs. v -> v -> v sort of approach, which has to do with what closures we might have to create in different contexts. This will come down to poring over Core and doing some benchmarking. This is actually true whether we try to unify strict and lazy or not. Personally, I'd love to be able to unify as much as we can, if we can get enough of the right things to inline the way we want to avoid hurting performance.

Final note: if we're careful enough about inlining, we might well be able to just use boxed everything and still not pay for it. That means making sure that the underlying implementation inlines into the Strict or Lazy version as soon as the wrapped-up function is passed in, while that Strict or Lazy version itself remains (at least) INLINABLE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to add both updateOrSnocWith# and updateOrSnocWith? For user code, where we expose it, I agree that having a "friendly" wrapper is valuable. But given we have updateOrSnocWith and updateOrSnocWithKey that seems like a lot of intermediate helper functions that don't get used a whole lot.

My inclination would be to keep this patch (which fixes a space leak) a small self-contained space leak fix, and then follow up with the larger unboxed sums and looking at the Core after. It feels like a subsequent refactor, and one that I'm not well placed to make.

-> A.Array (Leaf k v)
updateOrSnocWithKey f k0 v0 ary0 = go k0 v0 ary0 0 (A.length ary0)
where
Expand All @@ -1948,9 +1948,12 @@ updateOrSnocWithKey f k0 v0 ary0 = go k0 v0 ary0 0 (A.length ary0)
A.copy ary 0 mary 0 n
A.write mary n (L k v)
return mary
| otherwise = case A.index ary i of
(L kx y) | k == kx -> A.update ary i (L k (f k v y))
| otherwise -> go k v ary (i+1) n
| L kx y <- A.index ary i
, k == kx
, (# v2 #) <- f k v y
= A.update ary i (L k v2)
| otherwise
= go k v ary (i+1) n
{-# INLINABLE updateOrSnocWithKey #-}

updateOrConcatWith :: Eq k => (v -> v -> v) -> A.Array (Leaf k v) -> A.Array (Leaf k v) -> A.Array (Leaf k v)
Expand Down
55 changes: 55 additions & 0 deletions tests/Regressions.hs
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
module Main where

import Control.Applicative ((<$>))
import Control.Exception (evaluate)
import Control.Monad (replicateM)
import Data.Hashable (Hashable(..))
import qualified Data.HashMap.Strict as HM
import qualified Data.HashMap.Lazy as HML
import Data.List (delete)
import Data.Maybe
import GHC.Exts (touch#)
import GHC.IO (IO (..))
import System.Mem (performGC)
import System.Mem.Weak (mkWeakPtr, deRefWeak)
import System.Random (randomIO)
import Test.HUnit (Assertion, assert)
import Test.Framework (Test, defaultMain)
import Test.Framework.Providers.HUnit (testCase)
Expand Down Expand Up @@ -71,6 +82,48 @@ propEqAfterDelete (Keys keys) =
mapFromKeys :: [Int] -> HM.HashMap Int ()
mapFromKeys keys = HM.fromList (zip keys (repeat ()))

------------------------------------------------------------------------
-- Issue #254
sjakobi marked this conversation as resolved.
Show resolved Hide resolved

-- Key type that always collides.
newtype KC = KC Int
deriving (Eq, Ord, Show)
instance Hashable KC where
hashWithSalt salt _ = salt
sjakobi marked this conversation as resolved.
Show resolved Hide resolved

touch :: a -> IO ()
touch a = IO (\s -> (# touch# a s, () #))

-- We want to make sure that old values in the HashMap are evicted when new values are inserted,
-- even if they aren't evaluated. To do that, we use the WeakPtr trick described at
-- http://simonmar.github.io/posts/2018-06-20-Finding-fixing-space-leaks.html.
-- We insert a value named oldV into the HashMap, then insert over it, checking oldV is no longer reachable.
--
-- To make the test robust, it's important that oldV isn't hoisted up to the top or shared.
-- To do that, we generate it randomly.
issue254Lazy :: Assertion
issue254Lazy = do
i :: Int <- randomIO
let oldV = error $ "Should not be evaluated: " ++ show i
weakV <- mkWeakPtr oldV Nothing -- add the ability to test whether oldV is alive
mp <- evaluate $ HML.insert (KC 1) (error "Should not be evaluated") $ HML.fromList [(KC 0, "1"), (KC 1, oldV)]
performGC
res <- deRefWeak weakV -- gives Just if oldV is still alive
touch mp -- makes sure that we didn't GC away the whole HashMap, just oldV
assert $ isNothing res

-- Like issue254Lazy, but using strict HashMap
issue254Strict :: Assertion
issue254Strict = do
i :: Int <- randomIO
let oldV = show i
weakV <- mkWeakPtr oldV Nothing
mp <- evaluate $ HM.insert (KC 1) "3" $ HM.fromList [(KC 0, "1"), (KC 1, oldV)]
performGC
res <- deRefWeak weakV
touch mp
assert $ isNothing res

------------------------------------------------------------------------
-- * Test list

Expand All @@ -80,6 +133,8 @@ tests =
testCase "issue32" issue32
, testCase "issue39a" issue39
, testProperty "issue39b" propEqAfterDelete
, testCase "issue254 lazy" issue254Lazy
, testCase "issue254 strict" issue254Strict
]

------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions unordered-containers.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ test-suite regressions
hashable >= 1.0.1.1,
HUnit,
QuickCheck >= 2.4.0.1,
random,
test-framework >= 0.3.3,
test-framework-hunit,
test-framework-quickcheck2,
Expand Down