Merge TDSParserStateObject.StateSnapshot packet list mangement and reimplment #2164

Wraith2 · 2023-09-20T19:42:30Z

In the async state snapshot netfx and netcore currently use different approaches. Netfx uses a List<PacketData> and int counter to identify the current location as it moves forward. Netcore uses a singly linked list and a counter for when it needs to know the current position which is calculated by iterating. Both approaches have different perf issues, the list approach causes repeated array expansions when accumulating packets and the singly linked list causes an unpleasant cpu perf scaling issue when it iterates to calculate the position.

This PR replaces both methods with a shared doubly linked list. The double linking allows easy access to the start of the list for iteration, the end of the list for accumulation and easy navigation in either direction which will be useful later when continue mode is added.

This PR also makes some other changes in this area:

fixes the stack trace checking feature so it if is ever used it will work. I can't see us using this feature currently.
changes some function names to treat the snapshot more like an enumerable thing and uses MoveNext to advance.
adds debugger visualization support for packets (will not be covered since it is vs only) and the ability to identify each packet uniquely in debug mode.

The changes yield good perf results for async strings. They effectively reduce the scaling factor that is applied by the repeated replay of the packet list and keep the memory footprint almost identical to the current async version. Note that there is still an unfavourable scaling factor for async over sync, this does not linearize the operation.

BDN perf results, Async-cm31 is current main branch, The table lists the sizes from 1 to 20 meg in interesting increments.

size	sync	async-cm31	async-cm32
1	5.692	10.382	9.228
5	19.78	293.99	96.88
10	39.5	2,297.18	495.26
15	53.92	9,409.01	1,401.61
20	80.46	28,579.30	2,867.15

…ly linked list.

codecov · 2023-09-20T23:14:53Z

Codecov Report

Patch coverage is 96.66% of modified lines.

Files Changed	Coverage
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs	`96.15%`
...oft/Data/SqlClient/TdsParserStateObject.netcore.cs	`100.00%`
...osoft/Data/SqlClient/TdsParserStateObject.netfx.cs	`100.00%`

📢 Thoughts on this report? Let us know!.

panoskj · 2023-09-24T14:01:18Z

Good job, this is a much needed change.

I assume cm31 is the netcore version, but I am really curious about how netfx version would perform in your benchmark.

You also mentioned 1-20 MB tables, but in reality it is the column/cell size that matters, would be worth mentioning it too.

Finally, is there any reason you didn't use System.Collections.Generic.LinkedList?

For more context:

What happens in the async version is that, each time a packet arrives, the parser attempts to parse the next column. If the column is not parsed successfully (e.g. a large string column for which some packets have not being received yet), the packet is kept in a list (the list affected by this PR), so that the whole process can be repeated as soon as the next packet arrives.

Obviously a lot of work is repeated, that's why the parsing algorithm doesn't scale. A scalable implementation would either save its progress for each packet or estimate how many packets are needed in advance (to try parsing them just once).

Anyway, if we try to read a 8MB column, the current algorithm has to split it into 1024 packets of 8KB each (assuming 8KB default packet size). For the 1024 packets of this example, an array/list of just 8KB (assuming 8 bytes per reference) would be enough. So the problem of the netfx version that @Wraith2 mentioned (repeated array expansions) could be rectified by pre-allocating a little bit more memory (which would be nothing compared to the total size of the buffered packets). This array/list could also be re-used - it only needs to be created once (apparently netfx didn't re-use it, until parts of #198 were ported to netfx recently by #2144). This approach (netfx) should minimize GC pressure as well, if PacketData becomes a struct.

Long story short, it may be a bit hard to follow my reasoning here, but I am really curious how netfx version would perform in your benchmark, especially if you pre-allocate some more memory for the list and make PacketData a struct.

Wraith2 · 2023-09-24T14:21:47Z

The sizes I listed were a single field in a table, so 1-20 mib single strings.

How would the netfx version fare per-wise? probably only very slightly slower than the doubly linked list with the speed difference being caused by gc. The repeated array allocations were the reason that it was changed from an array to the linked list, the perf disadvantage of that wasn't immediately apparent. Remember that people haven't been complaining about the netfx version only netcore.

Why didn't I use LinkedList<T> ? because I don't need most of the functions it provides.

AlmightyLks · 2023-12-07T11:07:53Z

When can 5.2.0-preview4 be expected to release? Or rather, the full 5.2.0 release 😄

David-Engel · 2023-12-15T23:28:22Z

@AlmightyLks preview 4 is out. The 5.2 GA estimate can be viewed on the milestones.

move PacketData and list management to shared and reimplement as doub…

c033395

…ly linked list.

JRahnama requested review from David-Engel and DavoudEshtehari October 24, 2023 18:27

David-Engel approved these changes Oct 24, 2023

View reviewed changes

DavoudEshtehari added this to the 5.2.0-preview4 milestone Oct 24, 2023

DavoudEshtehari added 💡 Enhancement New feature request ➕ Code Health Changes related to source code improvements labels Oct 24, 2023

DavoudEshtehari approved these changes Oct 24, 2023

View reviewed changes

DavoudEshtehari requested a review from JRahnama October 24, 2023 23:07

JRahnama approved these changes Oct 25, 2023

View reviewed changes

JRahnama merged commit b86ddf1 into dotnet:main Oct 25, 2023
132 checks passed

Wraith2 deleted the combine-32 branch October 25, 2023 10:45

This was referenced Dec 1, 2023

Merge TDSParserStateObject.StateSnapshot stateObj fields to a new class in shared #2157

Merged

Progress? dotnet/datalab#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge TDSParserStateObject.StateSnapshot packet list mangement and reimplment #2164

Merge TDSParserStateObject.StateSnapshot packet list mangement and reimplment #2164

Wraith2 commented Sep 20, 2023 •

edited

Loading

codecov bot commented Sep 20, 2023

panoskj commented Sep 24, 2023

Wraith2 commented Sep 24, 2023 •

edited

Loading

AlmightyLks commented Dec 7, 2023 •

edited

Loading

David-Engel commented Dec 15, 2023

Merge TDSParserStateObject.StateSnapshot packet list mangement and reimplment #2164

Merge TDSParserStateObject.StateSnapshot packet list mangement and reimplment #2164

Conversation

Wraith2 commented Sep 20, 2023 • edited Loading

codecov bot commented Sep 20, 2023

Codecov Report

panoskj commented Sep 24, 2023

Wraith2 commented Sep 24, 2023 • edited Loading

AlmightyLks commented Dec 7, 2023 • edited Loading

David-Engel commented Dec 15, 2023

Wraith2 commented Sep 20, 2023 •

edited

Loading

Wraith2 commented Sep 24, 2023 •

edited

Loading

AlmightyLks commented Dec 7, 2023 •

edited

Loading