-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599
Conversation
buffer[offset + 6] = bytes[5]; | ||
buffer[offset + 7] = bytes[6]; | ||
buffer[offset + 8] = bytes[7]; | ||
buffer[offset + 0] = (byte)value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code started writing the value at offset + 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed now.
buffer[offset + 6] = bytes[5]; | ||
buffer[offset + 7] = bytes[6]; | ||
buffer[offset + 8] = bytes[7]; | ||
buffer[offset + 0] = (byte)value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just use BinaryPrimitives.WriteDoubleLittleEndian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that's much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, there are more places that can use BinaryPrimitives
, for example WriteInt64
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to make that change as part of this PR? I was hoping that this could get into net7 to fix CI there, so I was trying to keep the changes as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments after a quick read through
@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset) | |||
=> (ulong)GetInt64(offset); | |||
|
|||
public float GetSingle(int offset) | |||
=> ReadRawBytes<float>(offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: You could also have done
BitConverter.Int32BitsToSingle(GetInt32(offset))
,
@@ -1437,7 +1442,7 @@ public override unsafe void WriteArray(string? prefix, string localName, string? | |||
|
|||
public override unsafe void WriteArray(string? prefix, XmlDictionaryString localName, XmlDictionaryString? namespaceUri, decimal[] array, int offset, int count) | |||
{ | |||
if (Signing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the code it is very difficult to known if it produces the same result or if the format for arrays have changed.
You should ensure there are tests to cover this similar to those is #71478 , but it might make sense to check more types than I did.
My concern is that currently it only check round-trip support and I think it is important to ensure that the produced file will be readable on LittleEndian and that it can read content produced by little endian machins.
Suggestion:
If more code is needed then maybe you want to consider writing a genereic WriteArray helper.
My own idea (which assumed that it would have been ok to update the writer PR or that it was merged first) to solve the problem was to
- use a genereic ReadArray/WriteArray helper with a loop for reading/writing elements on big endian.
- generic "ReverseEndianness" method which need to handle at least short, int, long (other primitives, except deicmal, can always be cast to these using Unsafe or MemoryMarshal.Cast)
- (Read/WriteRawBytes might have been able to do raw byte swapping as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review! You're right that the array cases were not correct. I thought we could just fall back to the base class implementation like in the reader, but in the writer the base class actually doesn't create any array records, just a series of "regular" records. This is probably not wrong, but it clearly would be preferable to create the same output on big-endian as on little-endian.
Normally, I agree we should just merge your rewrite first, and then fix the endian problems on top. However, I'm not sure the rewrite will still be accepted into net7 at this stage, and I'd really like to see the endian bugs fixed in net7, so I'd prefer to have the endian fix go first, and the performance rewrite on top of that.
The updated implementation I just pushed replaces the generic UnsafeWriteArray
helper with per-type array helpers along the line of the existing WriteDateTimeArray
etc. These new helpers are still unsafe (for now) since I left the little-endian implementation unchanged, but with your patch on top it should be straightforward to make them all safe.
As to the ReverseEndianness
suggestion, I feel it should be cleaner to eliminate explicit byte-swap operations in favor of type-specific fixed-byte-order operations from BinaryPrimitives
- see also the comment by @jkotas above.
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea to ensure there are tests that the produced binary format is correct.
I dont know if and if so when #71478 will be merged so the tests added there cannot be relied upon to find any potential issues,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've added new tests from your PR #71478 and extended them by making sure all data types are covered.
@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset) | |||
=> (ulong)GetInt64(offset); | |||
|
|||
public float GetSingle(int offset) | |||
=> ReadRawBytes<float>(offset); | |||
=> BinaryPrimitives.ReadSingleLittleEndian(_buffer.AsSpan(offset, 4)); | |||
|
|||
public double GetDouble(int offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also GetSingle/GetDouble methods without the "int" parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had missed those. Now added.
Ping? Any comments on the new approach? |
|
||
Span<byte> span = buffer.AsSpan(offset, sizeof(decimal)); | ||
BinaryPrimitives.WriteInt32LittleEndian(span, bits[3]); | ||
BinaryPrimitives.WriteInt32LittleEndian(span.Slice(4), bits[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is correct. A decimal value is a 96 bit integer stored in little endian order with a final 32 bits holding some flags (stores a ^10 exponent divider and a sign bit). The order to be written to the span should be:
[low 32 bits] [mid 32 bits] [high 32 bits] [flags]
This looks like it writes out:
[flags] [high 32 bits] [low 32 bits] [mid 32 bits]
While each of the 32bit numbers should be written in little endian so need their byte orders switched as you've done, the TryGetBits method doesn't write each of the components out to the passed in span in an endian specific way. The order of ints in the span should remain the same.
Or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimal is a struct with two Int32 fields and one Int64 field. The declared order is flags, high32, low64.
So when the little-endian implementation iterates over 4 integer pointers here, it writes out flags, high32, bottom-of-low64, top-of-low64 if I'm not mistaken, because the "low" is stored as Int64 and that's the order it would come out on a little-endian machine? Personally, I think it would make more sense to follow the order of TryGetBits if this was a greenfield implementation... but the goal here is to match on the wire what gets written by the little-endian implementation.
Or that's how I read it. Maybe I'm missing something? This is why I asked for extra eyes. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @StephenMolloy - we need to match the existing wire format, which matches the in-memory format of the decimal
type on a little-endian machine, which is "[flags] [high 32] [low 32] [mid 32]".
fixed (int* items = &array[offset]) | ||
{ | ||
base.UnsafeWriteBytes((byte*)items, 4 * count); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to consider doing a similar approach for little endian as we do for bin endian as UnsafeWriteBytes forces a buffer flush whereas the big endian implementation only causes a buffer flush when the buffer doesn't have space.
We could also improve the performance of the big endian implementation by having a faster path when the size of the array is smaller than the buffer length. In that case you can call GetBuffer once and then have the loop wrap the write call, then call Advance once at the end. Could also do this for bigger arrays and just do it in batches which are smaller than the buffer size. The buffer length is hard coded to 512 bytes so these could be done in batches of 128 count.
None of that is required for this PR, just noting improvements that I can see which are possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had considered that. But for a very late-stage PR in 7.0 that we can hopefully bring back to 6.0... I thought it's probably better to stay faithful to the existing little-endian implementation. As you say, it's not required for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We can do performance improvements for big-endian systems for .NET 8. I'd be happy to work on this.
|
||
Span<byte> span = GetBuffer(16, out int bufferOffset).AsSpan(bufferOffset, 16); | ||
BinaryPrimitives.WriteInt32LittleEndian(span, bits[3]); | ||
BinaryPrimitives.WriteInt32LittleEndian(span.Slice(4), bits[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same ordering issue as WriteDecimalText
@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset) | |||
=> (ulong)GetInt64(offset); | |||
|
|||
public float GetSingle(int offset) | |||
=> ReadRawBytes<float>(offset); | |||
=> BinaryPrimitives.ReadSingleLittleEndian(_buffer.AsSpan(offset, 4)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we really should be using sizeof everywhere that we are passing how many bytes we want. For example, here it should be sizeof(float)
. This seems to have crept in to this code base all over so not requiring it to be fixed, but it would be nice if we fixed this everywhere in these classes.
DateTime datetime = new DateTime(2022, 8, 26, 12, 34, 56, DateTimeKind.Utc); | ||
Span<byte> datetimeBytes = stackalloc byte[8]; | ||
BinaryPrimitives.WriteInt64LittleEndian(datetimeBytes, datetime.ToBinary()); | ||
AssertReadContentFromBinary(datetime, XmlBinaryNodeType.DateTimeText, datetimeBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To properly test this, you really should test using an inline byte array like the other tests. All you are testing here is that WriteInt64LittleEndian and ReadInt64LittleEndian round trip data.
I've pushed @StephenMolloy's PR feedback commit from https://github.com/uweigand/runtime/pull/1 to this PR branch to trigger the test suite. Those changes all look good to me. I've successfully re-run the tests with those changes locally on s390x as well. |
Properly byte-swap float/double/decimal types. Handle array types correctly on big-endian systems. Added more test cases based on PR 71478. Fixes #74494
/backport to release/7.0 |
Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3024604883 |
@StephenMolloy backporting to release/7.0 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems
Applying: PR feedback
Applying: Re-enable binary XML tests on s390x
error: sha1 information is lacking or useless (src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Re-enable binary XML tests on s390x
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
Hi @akoeplinger it looks like the 7.0 backport is failing because your #75282 isn't in 7.0. Were you planning on backporting this as well? I think it would be great to have the CI green in 7.0 as well. |
hmm I wasn't planning on backporting it, but I guess it can't hurt |
@akoeplinger - I'm manually backporting this PR, so I don't think your change needs to be backported. |
the armv6 and ppc changes are still good to have :) |
* Manually backporting #74599 to 7.0 for RC2. * Fix a couple mis-copied lines of code and a couple nits.
* Manually backporting dotnet#74599 to 7.0 for RC2. * Fix a couple mis-copied lines of code and a couple nits.
Properly byte-swap float/double/decimal types.
Handle array types correctly on big-endian systems.
Added test case for array-of-decimal types.
Fixes #74494
CC @StephenMolloy @Daniel-Svensson @jkotas