Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnotationModel uses inefficient iteration over map #892

Closed
iloveeclipse opened this issue May 6, 2022 · 20 comments · Fixed by #980
Closed

AnnotationModel uses inefficient iteration over map #892

iloveeclipse opened this issue May 6, 2022 · 20 comments · Fixed by #980
Assignees
Milestone

Comments

@iloveeclipse
Copy link
Member

AnnotationModel.cleanup(boolean, boolean) iterates over map with keyset iterator and does lookup for every value from iterator. With ~10_000 annotations this is not fun anymore. Similar code in other places. That can be improved by using entryset iterator.

I plan to push the fix after eclipse-platform/eclipse.platform.text#19 is done.

iloveeclipse referenced this issue in iloveeclipse/eclipse.platform.text May 6, 2022
AnnotationModel.cleanup(boolean, boolean) iterates over map with keyset
iterator and does lookup for every value from iterator. With ~10_000
annotations this is not fun anymore. Similar code in other places. That
can be improved by using entryset iterator.

See https://github.com/eclipse-platform/eclipse.platform.text/issues/21
iloveeclipse referenced this issue in iloveeclipse/eclipse.platform.text May 6, 2022
AnnotationModel.cleanup(boolean, boolean) iterates over map with keyset
iterator and does lookup for every value from iterator. With ~10_000
annotations this is not fun anymore. Similar code in other places. That
can be improved by using entryset iterator.

See https://github.com/eclipse-platform/eclipse.platform.text/issues/21
@iloveeclipse iloveeclipse self-assigned this May 6, 2022
@iloveeclipse iloveeclipse added this to the 4.24 M3 milestone May 6, 2022
iloveeclipse referenced this issue in iloveeclipse/eclipse.platform.text Jun 22, 2022
This reverts commit db6a3b4 because
this change may cause inconsistent model state in case of parallel
operations on the annotations map (key set iterator was working on a
snapshot, but entry set is not).

See discussion on
eclipse-platform#22
iloveeclipse referenced this issue in eclipse-platform/eclipse.platform.text Jun 22, 2022
This reverts commit db6a3b4 because
this change may cause inconsistent model state in case of parallel
operations on the annotations map (key set iterator was working on a
snapshot, but entry set is not).

See discussion on
#22
@iloveeclipse iloveeclipse reopened this Jun 22, 2022
@iloveeclipse iloveeclipse removed this from the 4.24 M3 milestone Jun 22, 2022
@iloveeclipse iloveeclipse removed their assignment Jun 22, 2022
@iloveeclipse
Copy link
Member Author

The patch that was merged in 4.24 was reverted in 4.25 M1 due possible MT issues.

In order to fix that properly, a new IAnnotationMap2 interface is needed which provides "safe" entrySetIterator() over the map, and AnnotationModel should check if the (internal) IAnnotationMap used allows "fast" iteration or not.

I do not plan to work on this, feel free to take over.

@fedejeanne
Copy link
Contributor

Hi all,
I was looking into this issue and I could use your help.

First of all I agree that the current implementation could be improved, but I just can't really grasp how severe the problem is. I did some simple testing (see below) and it looks "good enough" for me. Maybe you could provide some reference values to compare with to help size this issue?

How I tested

  1. Patched AnnotationModel (AnnotationModel.patch) in order to time the current implementation
  2. Created and opened BigClass. It has near to 35.000 calls to method();
  3. Selected method() in the editor --> it took around 10 seconds but all occurrences were highlighted
  4. Selected the whole body of the main method
  5. Deleted the whole body at once --> Output [1] Deleted: 34.932 | Map size: 34.934 | Duration: 22 ms

My observations

  1. AnnotationModel.removeAllAnnotations(boolean) never got called (any hints on how to reach this code would be appreciated)
  2. Deleting nearly 35.000 entries merely took 22 ms.

My environment

Machine

  • Processor: Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz
  • RAM: 128 GB
  • OS: Windows 10 Enterprise (64 Bit)

sdk.product

  • Program arguments: -os ${target.os} -ws ${target.ws} -arch ${target.arch} -nl ${target.nl} -consoleLog --launcher.defaultAction openFile --launcher.appendVmargs
  • VM arguments: -Dorg.eclipse.swt.graphics.Resource.reportNonDisposed=true -Dosgi.requiredJavaVersion=17 -Dosgi.dataAreaRequiresExplicitInit=true -Xms256m -Xmx2048m --add-modules=ALL-SYSTEM

@HeikoKlare
Copy link
Contributor

We face some performance problems in JDT which are also related to the AnnotationModel implementation (see eclipse-jdt/eclipse.jdt.ui#665), so I would also be interested in whether this issue has some downstream effects on the derived CompilationUnitAnnotationModel of JDT.

I have two question to better understand and maybe fix this issue. Maybe @iloveeclipse you still have some further insights, even though you do not plan to work on the issue?

  1. I do not completely understand the issue yet. Why is iterating over the keyset and then retrieving each value a performance problem here? The AnnotationMap, over which the iteration is performed, uses a HashMap, which implements lookup in O(1). Annotation does not even implement hashcode(), so from my understanding the lookup should not have a noticable performance impact here.
  2. I there a chance to reproduce the problem? In particular, the reproduction by @fedejeanne uses the Java class editor, so the JDT specialization of the annotation model is used (CompilationUnitAnnotationModel). As far as I understand, the problem should already be reproducible with, e.g., annotations in the generic text editor. Or is the performance impact only noticable when using the Java editor (or any other editor using a specialization of the AnnotationModel) so that maybe some implementation detail of a specialized AnnotationModel affects performance?

Thanks in advance for following up on the issue!

@laeubi
Copy link
Contributor

laeubi commented Jul 31, 2023

@HeikoKlare even if lookup is constant it takes a constant time (in the described case 10.000 x C), while iterating using entryset would perform the iteration+lookup in one step with only the overhead of creating Entry objects (if they are not already entries).

This can even be improved by using java.util.Map.forEach(BiConsumer<? super K, ? super V>) in which case an implementation might choose an more efficient implementation.

@HeikoKlare
Copy link
Contributor

That's true, it would definitely be more efficient. I was just wondering whether that improvement would be noticable since everything else done with a single annotation during cleanup/removeAllAnnotations/ removeAnnotation should require quite more time than the single map lookup. Maybe I misunderstood the issue and it's really only about using the potential for improvement rather than being a "noticable" performance issue.

@iloveeclipse
Copy link
Member Author

iloveeclipse commented Aug 1, 2023

2. I there a chance to reproduce the problem?

Yes.
GenerateJava.java.txt

Just put this file (rename it to java) to java project, let generate it "huge" Java file and move generated file to the source folder. After that, open it in Java editor and try to fix compilation error by opening find/replace dialog and replace all public x = 42;with public int x = 42;

On Linux it freezes for me for ~30 seconds. With this patch applied wait time reduced to ~15 seconds. Unfortunately the patch is not thread safe / breaking AnnotationModel contract, see discussion on eclipse-platform/eclipse.platform.text#22

Note, this is purely platform UI issue, eclipse-jdt/eclipse.jdt.ui#677 doesn't change anything here.

Update
Note: you have to close "Outline" view while experiments - it doesn't scale (yet, see #810 for viewers fix) and adds a lot to UI freezes.

@HeikoKlare
Copy link
Contributor

Perfect, thank you very much for sharing the information for reproduction @iloveeclipse!
With that descrption I assume that the issue also affects our development environment, so we (@fedejeanne or me) will have another look.

iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Aug 1, 2023
Introduced new IAnnotationMap2 interface that overrides default
forEach() to synchronize on lock object.

This halves iteration time spent in cleanup() on huge annotation models.

Fixes eclipse-platform#892
@iloveeclipse
Copy link
Member Author

will have another look.

Please check #980
For me it halves the time used on model cleanup code.
I believe this is better than nothing,

@fedejeanne
Copy link
Contributor

Thank you both for the input! I will take a look at it and report back.

iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Aug 1, 2023
Introduced new IAnnotationMap2 interface that overrides default
forEach() to synchronize on lock object.

This halves iteration time spent in cleanup() on huge annotation models.

Fixes eclipse-platform#892
iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Aug 1, 2023
AnnotationMap will overrides default forEach() to synchronize on lock
object and AnnotationModel will use that method to iterate over map
instead to iterate over keys id possible.

This halves iteration time spent in cleanup() on huge annotation models.

Fixes eclipse-platform#892
iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Aug 1, 2023
AnnotationModel.cleanup() will lock on the map while iterating over map
and iterate via forEach() instead to iterate over keys if possible.

This halves iteration time spent in cleanup() on huge annotation models.

Fixes eclipse-platform#892
@fedejeanne
Copy link
Contributor

fedejeanne commented Aug 1, 2023

I took a look at the performance using the steps described above and I noticed the following:

image

  1. One call to FindReplaceDialog.replaceAll does hundreds of calls to (replaceSelection and) findAndSelect, which ends up calling AnnotationModel.cleanup down the line
  2. Though the get method seems the problem at first, actually creating the keySetIterator takes more time (the iterator copies the data) so getting rid of it in AnnotationModel uses inefficient iteration over map, take 2 #980 was a good call 👍
  3. The self time of getDeleted (the private method I extracted in the AnnotationModel.patch that I mentioned above) takes most of the time, so I'm guessing there is some other potential improvement in the current implementation.

Regarding point number 3., I will run the sampler again once #980 is merged and see if there are some other potential improvements that can be applied.

iloveeclipse added a commit that referenced this issue Aug 1, 2023
AnnotationModel.cleanup() will lock on the map while iterating over map
and iterate via forEach() instead to iterate over keys if possible.

This halves iteration time spent in cleanup() on huge annotation models.

Fixes #892
@iloveeclipse iloveeclipse self-assigned this Aug 1, 2023
@iloveeclipse iloveeclipse added this to the 4.29 M3 milestone Aug 1, 2023
@iloveeclipse
Copy link
Member Author

I will run the sampler again once #980 is merged and see if there are some other potential improvements that can be applied.

Yes please. In my old (internal) notes I saw search/replace code would also hang in:

org.eclipse.jface.text.AbstractDocument.updateDocumentStructures()
org.eclipse.jface.text.AbstractDocument.updatePositions ()

@fedejeanne
Copy link
Contributor

So, I ran the same search-and-replace operation with the improved version from this PR and this a snapshot from the sampling:
image

My observations

  1. The time it takes to run getAnnotationIterator was reduced by ~35%! 👏👏 From 18s to ~11s
  2. Most of this performance improvement comes from the cleanup method so the changes in this PR really helped (is difficult to see it in the previous screenshot because most of the time of getDeleted (i.e. cleanup) goes to Self, but the improvement is there).
  3. Since most of the time goes to the forEach now, there is one little trick I would like to try tomorrow, maybe I can boost the performance even more (stay tuned)

I will run the sampler again once #980 is merged and see if there are some other potential improvements that can be applied.

Yes please. In my old (internal) notes I saw search/replace code would also hang in:

org.eclipse.jface.text.AbstractDocument.updateDocumentStructures()
org.eclipse.jface.text.AbstractDocument.updatePositions ()

I also noticed that but I haven't had time to pursue it. I'll look into it this week and report back :-)

@szarnekow
Copy link
Contributor

there is one little trick I would like to try tomorrow, maybe I can boost the performance even more

You have my attention :)

@fedejeanne
Copy link
Contributor

So I tried changing the forEach with 2 parallel streams today, like this:

if (mapLock == null) {
	...
} else {
	synchronized (mapLock) {
		// get the ones where the position is null
		annotations.entrySet().parallelStream()//
				.filter(e -> e.getValue() == null)//
				.forEach(e -> deleted.add(e.getKey()));

		// get the ones where the position is deleted
		annotations.entrySet().parallelStream()//
				.filter(e -> e.getValue() != null)//
				.filter(e -> e.getValue().isDeleted)//
				.forEach(e -> deleted.add(e.getKey()));
	}
}

And this is what I got:
image

As you can see, the cleanup method performs slightly better but the overall performance is slightly worse (might be a hiccup, but it might not).

I'd say the trick didn't perform as expected so I would leave this one out.

@szarnekow
Copy link
Contributor

I ran the same search-and-replace operation

Any chance to share more information about the s&r that you do there? How many open editors? How many hits? How big is the file / are the files? Anything that allows to reproduce these numbers?

@fedejeanne
Copy link
Contributor

Sure thing! It's just the generated Java file with 12K inner classes (see previous comment from Andrey), I have only 4 editors open, a bunch of projects and compilation errors in my WS and I'm replacing public x = 42 with public int x = 42 to fix the compiler errors of LargeJavaFile.java. Please notice the first System.out.println(...) of each inner class does not have a compile error, I fixed those previously.

This is how it looks like:

image

@laeubi
Copy link
Contributor

laeubi commented Aug 3, 2023

So I tried changing the forEach with 2 parallel streams today, like this:

Please note that your current implementation, even if executed in parallel might give better performance if you make your parallel stream unordered (I assume order does not really matter here). Beside that I'm not sure why iterating the stream twice should be faster than one iteration using (e.getValue() == null || e.getValue().isDeleted())

@laeubi
Copy link
Contributor

laeubi commented Aug 3, 2023

Another thing: Instead of adding it with for each you can collect the stream into a list/set what also has optimizations in parallel mode.

@szarnekow
Copy link
Contributor

I'd assume the forEach is no longer measurable in isolation with 100k annotations present. The loop in the FindReplaceDialog.replaceAll seems to perform an awful lot of UI work instead of replacing all matches and working with annotations only once.

@fedejeanne
Copy link
Contributor

Thank you @laeubi ! I took your suggestions and the performance does improve when using parallel streams. See #990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants