Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

uweigand · 2022-08-25T17:38:14Z

Properly byte-swap float/double/decimal types.
Handle array types correctly on big-endian systems.
Added test case for array-of-decimal types.

Fixes #74494

CC @StephenMolloy @Daniel-Svensson @jkotas

jkotas · 2022-08-25T17:56:00Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

-                buffer[offset + 6] = bytes[5];
-                buffer[offset + 7] = bytes[6];
-                buffer[offset + 8] = bytes[7];
+                buffer[offset + 0] = (byte)value;


The original code started writing the value at offset + 1.

Oops. Fixed now.

jkotas · 2022-08-25T17:56:59Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

-                buffer[offset + 6] = bytes[5];
-                buffer[offset + 7] = bytes[6];
-                buffer[offset + 8] = bytes[7];
+                buffer[offset + 0] = (byte)value;


This should just use BinaryPrimitives.WriteDoubleLittleEndian

Agreed, that's much simpler.

FWIW, there are more places that can use BinaryPrimitives, for example WriteInt64 method.

Do you want me to make that change as part of this PR? I was hoping that this could get into net7 to fix CI there, so I was trying to keep the changes as small as possible.

Daniel-Svensson

Just some comments after a quick read through

Daniel-Svensson · 2022-08-26T08:10:32Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs

@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset)
            => (ulong)GetInt64(offset);

        public float GetSingle(int offset)
-            => ReadRawBytes<float>(offset);


FYI: You could also have done
BitConverter.Int32BitsToSingle(GetInt32(offset)),

Daniel-Svensson · 2022-08-26T10:39:36Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

@@ -1437,7 +1442,7 @@ public override unsafe void WriteArray(string? prefix, string localName, string?

        public override unsafe void WriteArray(string? prefix, XmlDictionaryString localName, XmlDictionaryString? namespaceUri, decimal[] array, int offset, int count)
        {
-            if (Signing)


Based on the code it is very difficult to known if it produces the same result or if the format for arrays have changed.
You should ensure there are tests to cover this similar to those is #71478 , but it might make sense to check more types than I did.

My concern is that currently it only check round-trip support and I think it is important to ensure that the produced file will be readable on LittleEndian and that it can read content produced by little endian machins.

Suggestion:
If more code is needed then maybe you want to consider writing a genereic WriteArray helper.

My own idea (which assumed that it would have been ok to update the writer PR or that it was merged first) to solve the problem was to

use a genereic ReadArray/WriteArray helper with a loop for reading/writing elements on big endian.

generic "ReverseEndianness" method which need to handle at least short, int, long (other primitives, except deicmal, can always be cast to these using Unsafe or MemoryMarshal.Cast)

(Read/WriteRawBytes might have been able to do raw byte swapping as well)

Thanks for the thorough review! You're right that the array cases were not correct. I thought we could just fall back to the base class implementation like in the reader, but in the writer the base class actually doesn't create any array records, just a series of "regular" records. This is probably not wrong, but it clearly would be preferable to create the same output on big-endian as on little-endian.

Normally, I agree we should just merge your rewrite first, and then fix the endian problems on top. However, I'm not sure the rewrite will still be accepted into net7 at this stage, and I'd really like to see the endian bugs fixed in net7, so I'd prefer to have the endian fix go first, and the performance rewrite on top of that.

The updated implementation I just pushed replaces the generic UnsafeWriteArray helper with per-type array helpers along the line of the existing WriteDateTimeArray etc. These new helpers are still unsafe (for now) since I left the little-endian implementation unchanged, but with your patch on top it should be straightforward to make them all safe.

As to the ReverseEndianness suggestion, I feel it should be cleaner to eliminate explicit byte-swap operations in favor of type-specific fixed-byte-order operations from BinaryPrimitives - see also the comment by @jkotas above.

Daniel-Svensson · 2022-08-26T10:41:03Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

}
+

I think it would be a good idea to ensure there are tests that the produced binary format is correct.
I dont know if and if so when #71478 will be merged so the tests added there cannot be relied upon to find any potential issues,

Agreed. I've added new tests from your PR #71478 and extended them by making sure all data types are covered.

Daniel-Svensson · 2022-08-26T10:42:20Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs

@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset)
            => (ulong)GetInt64(offset);

        public float GetSingle(int offset)
-            => ReadRawBytes<float>(offset);
+            => BinaryPrimitives.ReadSingleLittleEndian(_buffer.AsSpan(offset, 4));

        public double GetDouble(int offset)


There are also GetSingle/GetDouble methods without the "int" parameter

Thanks, I had missed those. Now added.

uweigand · 2022-09-07T20:49:20Z

Ping? Any comments on the new approach?

mconnew · 2022-09-09T00:10:41Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

+
+                Span<byte> span = buffer.AsSpan(offset, sizeof(decimal));
+                BinaryPrimitives.WriteInt32LittleEndian(span, bits[3]);
+                BinaryPrimitives.WriteInt32LittleEndian(span.Slice(4), bits[2]);


I'm not sure if this is correct. A decimal value is a 96 bit integer stored in little endian order with a final 32 bits holding some flags (stores a ^10 exponent divider and a sign bit). The order to be written to the span should be:
[low 32 bits] [mid 32 bits] [high 32 bits] [flags]
This looks like it writes out:
[flags] [high 32 bits] [low 32 bits] [mid 32 bits]
While each of the 32bit numbers should be written in little endian so need their byte orders switched as you've done, the TryGetBits method doesn't write each of the components out to the passed in span in an endian specific way. The order of ints in the span should remain the same.
Or am I missing something here?

decimal is a struct with two Int32 fields and one Int64 field. The declared order is flags, high32, low64.

So when the little-endian implementation iterates over 4 integer pointers here, it writes out flags, high32, bottom-of-low64, top-of-low64 if I'm not mistaken, because the "low" is stored as Int64 and that's the order it would come out on a little-endian machine? Personally, I think it would make more sense to follow the order of TryGetBits if this was a greenfield implementation... but the goal here is to match on the wire what gets written by the little-endian implementation.

Or that's how I read it. Maybe I'm missing something? This is why I asked for extra eyes. ;)

Yes, I agree with @StephenMolloy - we need to match the existing wire format, which matches the in-memory format of the decimal type on a little-endian machine, which is "[flags] [high 32] [low 32] [mid 32]".

mconnew · 2022-09-09T00:53:10Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

+                fixed (int* items = &array[offset])
+                {
+                    base.UnsafeWriteBytes((byte*)items, 4 * count);
+                }


We might want to consider doing a similar approach for little endian as we do for bin endian as UnsafeWriteBytes forces a buffer flush whereas the big endian implementation only causes a buffer flush when the buffer doesn't have space.
We could also improve the performance of the big endian implementation by having a faster path when the size of the array is smaller than the buffer length. In that case you can call GetBuffer once and then have the loop wrap the write call, then call Advance once at the end. Could also do this for bigger arrays and just do it in batches which are smaller than the buffer size. The buffer length is hard coded to 512 bytes so these could be done in batches of 128 count.
None of that is required for this PR, just noting improvements that I can see which are possible.

I had considered that. But for a very late-stage PR in 7.0 that we can hopefully bring back to 6.0... I thought it's probably better to stay faithful to the existing little-endian implementation. As you say, it's not required for this PR.

Agreed. We can do performance improvements for big-endian systems for .NET 8. I'd be happy to work on this.

mconnew · 2022-09-09T00:58:25Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBinaryWriter.cs

+
+                    Span<byte> span = GetBuffer(16, out int bufferOffset).AsSpan(bufferOffset, 16);
+                    BinaryPrimitives.WriteInt32LittleEndian(span, bits[3]);
+                    BinaryPrimitives.WriteInt32LittleEndian(span.Slice(4), bits[2]);


Same ordering issue as WriteDecimalText

mconnew · 2022-09-09T01:11:38Z

src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlBufferReader.cs

@@ -964,10 +964,10 @@ public ulong GetUInt64(int offset)
            => (ulong)GetInt64(offset);

        public float GetSingle(int offset)
-            => ReadRawBytes<float>(offset);
+            => BinaryPrimitives.ReadSingleLittleEndian(_buffer.AsSpan(offset, 4));



I think we really should be using sizeof everywhere that we are passing how many bytes we want. For example, here it should be sizeof(float). This seems to have crept in to this code base all over so not requiring it to be fixed, but it would be nice if we fixed this everywhere in these classes.

mconnew · 2022-09-09T01:13:55Z

src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs

+            DateTime datetime = new DateTime(2022, 8, 26, 12, 34, 56, DateTimeKind.Utc);
+            Span<byte> datetimeBytes = stackalloc byte[8];
+            BinaryPrimitives.WriteInt64LittleEndian(datetimeBytes, datetime.ToBinary());
+            AssertReadContentFromBinary(datetime, XmlBinaryNodeType.DateTimeText, datetimeBytes);


To properly test this, you really should test using an inline byte array like the other tests. All you are testing here is that WriteInt64LittleEndian and ReadInt64LittleEndian round trip data.

uweigand · 2022-09-09T10:36:00Z

I've pushed @StephenMolloy's PR feedback commit from https://github.com/uweigand/runtime/pull/1 to this PR branch to trigger the test suite. Those changes all look good to me. I've successfully re-run the tests with those changes locally on s390x as well.

akoeplinger · 2022-09-09T12:30:20Z

@uweigand FYI I've disabled the tests in #75282 to get the runtime-community pipeline green again, please merge main/rebase this PR to get that change and remove the ActiveIssue to reenable the tests.

Properly byte-swap float/double/decimal types. Handle array types correctly on big-endian systems. Added more test cases based on PR 71478. Fixes #74494

uweigand · 2022-09-09T12:43:21Z

@uweigand FYI I've disabled the tests in #75282 to get the runtime-community pipeline green again, please merge main/rebase this PR to get that change and remove the ActiveIssue to reenable the tests.

Done, thanks!

StephenMolloy · 2022-09-09T18:41:13Z

/backport to release/7.0

github-actions · 2022-09-09T18:41:29Z

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3024604883

github-actions · 2022-09-09T18:43:38Z

@StephenMolloy backporting to release/7.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems
Applying: PR feedback
Applying: Re-enable binary XML tests on s390x
error: sha1 information is lacking or useless (src/libraries/System.Runtime.Serialization.Xml/tests/XmlDictionaryReaderTests.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Re-enable binary XML tests on s390x
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

uweigand · 2022-09-09T19:07:41Z

Hi @akoeplinger it looks like the 7.0 backport is failing because your #75282 isn't in 7.0. Were you planning on backporting this as well? I think it would be great to have the CI green in 7.0 as well.

akoeplinger · 2022-09-09T19:20:56Z

hmm I wasn't planning on backporting it, but I guess it can't hurt

StephenMolloy · 2022-09-09T19:23:34Z

@akoeplinger - I'm manually backporting this PR, so I don't think your change needs to be backported.

akoeplinger · 2022-09-09T19:25:33Z

the armv6 and ppc changes are still good to have :)

* Manually backporting #74599 to 7.0 for RC2. * Fix a couple mis-copied lines of code and a couple nits.

* Manually backporting dotnet#74599 to 7.0 for RC2. * Fix a couple mis-copied lines of code and a couple nits.

dotnet-issue-labeler bot added the area-Serialization label Aug 25, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 25, 2022

jkotas reviewed Aug 25, 2022

View reviewed changes

Daniel-Svensson reviewed Aug 26, 2022

View reviewed changes

mconnew reviewed Sep 9, 2022

View reviewed changes

uweigand and others added 3 commits September 9, 2022 14:40

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems

8cff819

Properly byte-swap float/double/decimal types. Handle array types correctly on big-endian systems. Added more test cases based on PR 71478. Fixes #74494

PR feedback

cdf4996

Re-enable binary XML tests on s390x

48c4442

StephenMolloy approved these changes Sep 9, 2022

View reviewed changes

StephenMolloy merged commit c6c7d3c into dotnet:main Sep 9, 2022

StephenMolloy added a commit to StephenMolloy/runtime that referenced this pull request Sep 9, 2022

Manually backporting dotnet#74599 to 7.0 for RC2.

291c470

This was referenced Sep 9, 2022

Backport 74599 to 7.0 #75365

Closed

Manually backporting #74599 to 7.0 for RC2. #75366

Merged

carlossanlop pushed a commit that referenced this pull request Sep 12, 2022

Manually backporting #74599 to 7.0 for RC2. (#75366)

3fbdf4c

* Manually backporting #74599 to 7.0 for RC2. * Fix a couple mis-copied lines of code and a couple nits.

StephenMolloy added a commit to StephenMolloy/runtime that referenced this pull request Sep 14, 2022

Manually backporting dotnet#74599 to 7.0 for RC2. (dotnet#75366)

1700b54

* Manually backporting dotnet#74599 to 7.0 for RC2. * Fix a couple mis-copied lines of code and a couple nits.

StephenMolloy mentioned this pull request Sep 14, 2022

[release/6.0] Manually backporting #74599 #75648

Closed

ghost locked as resolved and limited conversation to collaborators Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

uweigand commented Aug 25, 2022

jkotas Aug 25, 2022

uweigand Aug 25, 2022

jkotas Aug 25, 2022

uweigand Aug 25, 2022

jkotas Aug 25, 2022

uweigand Aug 25, 2022 •

edited

Loading

Daniel-Svensson left a comment

Daniel-Svensson Aug 26, 2022

Daniel-Svensson Aug 26, 2022

uweigand Aug 26, 2022 •

edited

Loading

Daniel-Svensson Aug 26, 2022 •

edited

Loading

uweigand Aug 26, 2022

Daniel-Svensson Aug 26, 2022

uweigand Aug 26, 2022

uweigand commented Sep 7, 2022

mconnew Sep 9, 2022

StephenMolloy Sep 9, 2022 •

edited

Loading

uweigand Sep 9, 2022

mconnew Sep 9, 2022

StephenMolloy Sep 9, 2022

uweigand Sep 9, 2022

mconnew Sep 9, 2022

mconnew Sep 9, 2022

mconnew Sep 9, 2022

uweigand commented Sep 9, 2022

akoeplinger commented Sep 9, 2022

uweigand commented Sep 9, 2022

StephenMolloy commented Sep 9, 2022

github-actions bot commented Sep 9, 2022

github-actions bot commented Sep 9, 2022

uweigand commented Sep 9, 2022

akoeplinger commented Sep 9, 2022

StephenMolloy commented Sep 9, 2022

akoeplinger commented Sep 9, 2022

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

Fix XmlBinaryReader/XmlBinaryWriter on big-endian systems #74599

Conversation

uweigand commented Aug 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand Aug 25, 2022 • edited Loading

Choose a reason for hiding this comment

Daniel-Svensson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand Aug 26, 2022 • edited Loading

Choose a reason for hiding this comment

Daniel-Svensson Aug 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand commented Sep 7, 2022

Choose a reason for hiding this comment

StephenMolloy Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand commented Sep 9, 2022

akoeplinger commented Sep 9, 2022

uweigand commented Sep 9, 2022

StephenMolloy commented Sep 9, 2022

github-actions bot commented Sep 9, 2022

github-actions bot commented Sep 9, 2022

uweigand commented Sep 9, 2022

akoeplinger commented Sep 9, 2022

StephenMolloy commented Sep 9, 2022

akoeplinger commented Sep 9, 2022

uweigand Aug 25, 2022 •

edited

Loading

uweigand Aug 26, 2022 •

edited

Loading

Daniel-Svensson Aug 26, 2022 •

edited

Loading

StephenMolloy Sep 9, 2022 •

edited

Loading