Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space leak with collisions #254

Closed
ndmitchell opened this issue May 27, 2020 · 5 comments · Fixed by #258
Closed

Space leak with collisions #254

ndmitchell opened this issue May 27, 2020 · 5 comments · Fixed by #258

Comments

@ndmitchell
Copy link
Contributor

Given the following program:

import qualified Data.HashMap.Strict as Map
import Data.IORef
import Data.Hashable
import Control.Monad

newtype Wrapper a = Wrapper a
    deriving Eq

instance Hashable (Wrapper a) where
    hashWithSalt _ _ = 0

main = do
    mp <- newIORef $ Map.fromList [(Wrapper 0, "0"), (Wrapper 1, "1")]
    forM_ [1..1000000000] $ \i -> do
        modifyIORef' mp $ Map.insert (Wrapper 2) (show i)

It leaks about 1Gb/s when compiled with -O. It is an example cut down from Ghcide, haskell/ghcide#586, where it leaks 30Mb/s. The issue is that updateOrSnocWith lazily applies a function to the key, the old value, and the new value. In the case of inserting with a collision, the only relevant values are the old values, and everything else is const ignored. One remedy, at the cost of duplicating code, so to write a dedicated updateOrSnoc that skips the function call entirely, as anything else is likely to overly-force values in the .Lazy case.

@emilypi
Copy link
Member

emilypi commented May 30, 2020

Hey @ndmitchell, since you've probably thought about this more than any of us, would you mind submitting a patch fixing this? @sjakobi and I are now in the swing of helping @treeowl clear out the backlog (modulo a few permissions wibbles that need to be sorted), and we can review whatever you give us ASAP.

@treeowl
Copy link
Collaborator

treeowl commented May 30, 2020

Another option we use some places is to use an unboxed unary tuple as the result of the passed function. Then we can handle strict and lazy uniformly without this sort of leak. I think that's probably better.

@ndmitchell
Copy link
Contributor Author

@treeowl sounds like a cool technique, but not one I'm familiar with. I read the code, but it's not immediately obvious to me what it does. Is it documented somewhere? Or in some notes?

@treeowl
Copy link
Collaborator

treeowl commented May 30, 2020

@treeowl sounds like a cool technique, but not one I'm familiar with. I read the code, but it's not immediately obvious to me what it does. Is it documented somewhere? Or in some notes?

A boxed unary tuple looks like this:

data Box a = Box {unBox :: a}
  deriving KitchenSink

This looks a lot like Identity, but it has different strictness properties. It gives a nice way to stick an extra handle on an API. So

class Funktor f where
  funkMap :: (a -> Box b) -> f a -> f b

A typical instance ties unpacking the results to building the spine:

instance Funktor [] where
  funkMap f = go where
    go [] = []
    go (x : xs)
      | Box y <- f x
      = y : go xs

Now you can define

map, map' :: Funktor f => (a -> b) -> f a -> f b
map f = funkMap (Box . f)
map' f = funkMap (\x -> Box $! f x)

replace :: Funktor f => a -> f b -> f a
replace x = funkMap (const (Box x))

Just one underlying mapping function, and no space leaks!

The trouble is that these Boxes all have to be allocated. That might actually be okay for updateOrSnocWith, now that I think about it, but for functions that are applied multiple times, it's kind of lousy. The trick then is to use (# a #) instead of Box a, removing those. The only remaining tricky bit: if you don't INLINE hard enough, GHC has to build closures for those passed lambdas. I'm not sure how big a deal that is, but we can be careful to inline enough, winning source code size though not object code size.

@sjakobi sjakobi added this to the 0.2.11.0 milestone May 31, 2020
ndmitchell added a commit to ndmitchell/unordered-containers that referenced this issue May 31, 2020
ndmitchell added a commit to ndmitchell/unordered-containers that referenced this issue May 31, 2020
@ndmitchell
Copy link
Contributor Author

Thanks @treeowl for that awesome write up - very interesting! As a technique, it worked very nicely, so patch in #258.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants