-
Notifications
You must be signed in to change notification settings - Fork 99
Conversation
FWIW, after this patch and some in Shake, we're down to a 3Mb/min leak I can't get rid of in the GetModificationTimes function. I have no idea why it occurs, and the simplest things to make it go away don't work, and I've run out of time. I view it as not worth raising a ticket, since the bug will either go away on its own, or be very hard to reproduce. |
How did you track these leaks? |
Wrote a benchmark of repeatedly hover and noticed it leaked. Modified the code to do progressively less (e.g. turn off progress reporting, disable dirtying rules in Shake, short-circuit defineEarlyCutoff at lots of points) until the leak went away. Iterate on that pattern for an afternoon. |
|
||
-- This odd construction is required to prevent a space leak. | ||
-- We insert a fully evaluated value, but, somehow, the HashMap | ||
-- seems to keep track of the old value. I don't know how. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that it is the old value that is being retained and not the old key or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine that if it were the old key, then the lookup and forcing that to Just/Nothing would be sufficient. But it's not. However, I only conclude its the value by process of elimination, so other ideas welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reasonably reliable way to verify what is retained is to set a weak pointer on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a weak ref on the value in the map at the start. I then did an insertion and the value was still available. I then did the evaluate trick on the value looked up from the Map, and the value was still available. I did a GC before both those things. I don't get how they remain available. My guess is if we solve the leak it would print DEREF VALUE STILL LIVES. If the leak never existed it would print nothing. But it prints both DEREF VALUE STILL LIVES and CONTINUE for each line. I put the code in a gist at https://gist.github.com/ndmitchell/7c62c03cc0966ebfa108e07e0549943a - hopefully I've done something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could look in gdb to find out what is retaining it if there is an easy to reproduce branch somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running the benchmark at https://gist.github.com/ndmitchell/11467985dbf1855e62035fa97248a585, on Windows GHC 8.6.5. That benchmark just loads the Shake repo and then does hover requests on a listMaybe value in a loop. Code compiled at -O1
. You may wish to use Shake HEAD as otherwise there are other leaks that I've since eliminated. There is also a small leak with GetModifiedFiles I haven't tracked down, but this one is 10x that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a weak ref on the value in the map at the start. I then did an insertion and the value was still available. I then did the evaluate trick on the value looked up from the Map, and the value was still available. I did a GC before both those things. I don't get how they remain available. My guess is if we solve the leak it would print DEREF VALUE STILL LIVES. If the leak never existed it would print nothing. But it prints both DEREF VALUE STILL LIVES and CONTINUE for each line. I put the code in a gist at https://gist.github.com/ndmitchell/7c62c03cc0966ebfa108e07e0549943a - hopefully I've done something wrong.
This is easy to explain - the old value is still alive inside the update function because the state
var is still pointing at the old version of the hashmap. To fix the experiment, extract the weak reference bits out of the update function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gist shows that the values get released only after forcing the HashMap completely, and not in all cases, which means 3 things:
- Weak ptrs work as expected,
- The strict
HashMap
is not quite completely strict, so the fix in this PR is needed (but not enough?) - Something else keeps some values alive, which is not surprising.
It looks like the hashmap keeps the old value alive even after looking up the new value. Probably some thunk referencing the old version of the hashmap. Nothing much that can be done (forcing the whole hashmap is not practical) other than switching back to a Map
or relying on the fix in this PR to reduce the leakage
Example output:
DEREF VALUE STILL LIVES AFTER UPDATE
DEREF VALUE STILL LIVES AFTER EVALUATING THE MAP
DEREF VALUE STILL LIVES AFTER LOOKUP
DEREF VALUE STILL LIVES AFTER UPDATE
DEREF VALUE STILL LIVES AFTER EVALUATING THE MAP
DEREF VALUE STILL LIVES AFTER LOOKUP
DEREF VALUE STILL LIVES AFTER FORCING THE MAP
AN OLD VALUE GOT RELEASED
DEREF VALUE STILL LIVES AFTER UPDATE
DEREF VALUE STILL LIVES AFTER EVALUATING THE MAP
DEREF VALUE STILL LIVES AFTER LOOKUP
OLD VALUE IS GONE
AN OLD VALUE GOT RELEASED
OLD VALUE IS GONE
AN OLD VALUE GOT RELEASED
DEREF VALUE STILL LIVES AFTER UPDATE
DEREF VALUE STILL LIVES AFTER EVALUATING THE MAP
DEREF VALUE STILL LIVES AFTER LOOKUP
OLD VALUE IS GONE
AN OLD VALUE GOT RELEASED
DEREF VALUE STILL LIVES AFTER UPDATE
DEREF VALUE STILL LIVES AFTER EVALUATING THE MAP
DEREF VALUE STILL LIVES AFTER LOOKUP
[DEBUG] Cancelled request IdInt 39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the state var is still pointing at the old version of the hashmap
Doh! Thanks.
Now we can prove that's the underlying cause, we can turn our attention to HashMap. Lots of things rely on it, so a space leak there must be wreaking havoc on loads of stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now reproduced this bug on top of unordered-containers alone. The key is that it only appears to trigger if you have hash collisions. That leads quickly to the question of why the hell we have so many hash collisions. I think the answer is that the Hashable of Key is dodgy, and we should be hashing in type information or all different mono-constructor's end up with a constant hash. Happily, I can fix both things, so will do so!
@@ -258,7 +258,22 @@ setValues :: IdeRule k v | |||
-> IO () | |||
setValues state key file val = modifyVar_ state $ \vals -> do | |||
-- Force to make sure the old HashMap is not retained | |||
evaluate $ HMap.insert (file, Key key) (fmap toDyn val) vals | |||
let v = fmap toDyn val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to force val
as well? Dynamic
is not strict in the value I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, Dynamic is not strict in the value. But we don't need to force it, and forcing it doesn't eliminate the space leak, and we don't subsequently force it in the lookup anyway. I tried these things, including using seqValue, but they didn't have any impact.
case HMap.lookup k res of | ||
Nothing -> pure () | ||
Just v -> void $ evaluate v | ||
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that modifyVar_
is not strict, it doesn't force the updated value. I expect that switching to a strict variant will allow you to get rid of all the uses of evaluate
s in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we already called evaluate
on the HashMap after inserting and it’s a strict HashMap. Shouldn’t that be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that you wouldn't need that call to evaluate
. By always using a strict modifyVar_
, you wouldn't have to remember to evaluate
all the things you modify. Which is error-prone and ugly (IMO). On the other hand, you must remember to always use a strict modifyVar_
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we did call evaluate. So why was the old code leaking space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I can't explain that either.
Please don't merge this ticket. Useful, but I now have enough information to fix it at the source. |
Converted to a draft to prevent accidental merges. Thanks a lot for tracking this down @ndmitchell! |
The space leak only occurs if we have collisions, but the odds of a collision should be miniscule - if we implement our Hashable instances correctly. #588 fixes that, and then we can just wait for the unordered-containers fixes upstream. Closing this ticket. |
The comment in the code describes it best. I would have assumed unordered-containers gets this right. But, with a repeatedly hover and don't edit, I get 30Mb/min leak. If I change below, I get no leak. If I do anything without the second lookup/evaluate pair, the leak doesn't go away. I tried to reproduce this bug with unordered-containers on its own, and failed. I'm completely lost, but I do have a fix.