-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge TDSParserStateObject.StateSnapshot packet list mangement and reimplment #2164
Conversation
Codecov ReportPatch coverage is 📢 Thoughts on this report? Let us know!. |
Good job, this is a much needed change. I assume cm31 is the netcore version, but I am really curious about how netfx version would perform in your benchmark. You also mentioned 1-20 MB tables, but in reality it is the column/cell size that matters, would be worth mentioning it too. Finally, is there any reason you didn't use System.Collections.Generic.LinkedList? For more context: What happens in the async version is that, each time a packet arrives, the parser attempts to parse the next column. If the column is not parsed successfully (e.g. a large string column for which some packets have not being received yet), the packet is kept in a list (the list affected by this PR), so that the whole process can be repeated as soon as the next packet arrives. Obviously a lot of work is repeated, that's why the parsing algorithm doesn't scale. A scalable implementation would either save its progress for each packet or estimate how many packets are needed in advance (to try parsing them just once). Anyway, if we try to read a 8MB column, the current algorithm has to split it into 1024 packets of 8KB each (assuming 8KB default packet size). For the 1024 packets of this example, an array/list of just 8KB (assuming 8 bytes per reference) would be enough. So the problem of the netfx version that @Wraith2 mentioned (repeated array expansions) could be rectified by pre-allocating a little bit more memory (which would be nothing compared to the total size of the buffered packets). This array/list could also be re-used - it only needs to be created once (apparently netfx didn't re-use it, until parts of #198 were ported to netfx recently by #2144). This approach (netfx) should minimize GC pressure as well, if PacketData becomes a struct. Long story short, it may be a bit hard to follow my reasoning here, but I am really curious how netfx version would perform in your benchmark, especially if you pre-allocate some more memory for the list and make PacketData a struct. |
The sizes I listed were a single field in a table, so 1-20 mib single strings. How would the netfx version fare per-wise? probably only very slightly slower than the doubly linked list with the speed difference being caused by gc. The repeated array allocations were the reason that it was changed from an array to the linked list, the perf disadvantage of that wasn't immediately apparent. Remember that people haven't been complaining about the netfx version only netcore. Why didn't I use |
When can 5.2.0-preview4 be expected to release? Or rather, the full 5.2.0 release 😄 |
@AlmightyLks preview 4 is out. The 5.2 GA estimate can be viewed on the milestones. |
In the async state snapshot netfx and netcore currently use different approaches. Netfx uses a
List<PacketData>
and int counter to identify the current location as it moves forward. Netcore uses a singly linked list and a counter for when it needs to know the current position which is calculated by iterating. Both approaches have different perf issues, the list approach causes repeated array expansions when accumulating packets and the singly linked list causes an unpleasant cpu perf scaling issue when it iterates to calculate the position.This PR replaces both methods with a shared doubly linked list. The double linking allows easy access to the start of the list for iteration, the end of the list for accumulation and easy navigation in either direction which will be useful later when continue mode is added.
This PR also makes some other changes in this area:
The changes yield good perf results for async strings. They effectively reduce the scaling factor that is applied by the repeated replay of the packet list and keep the memory footprint almost identical to the current async version. Note that there is still an unfavourable scaling factor for async over sync, this does not linearize the operation.
BDN perf results, Async-cm31 is current main branch, The table lists the sizes from 1 to 20 meg in interesting increments.