Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(MQ cleanup) Remove some unsafe code from System.Xml #43379

Merged
merged 10 commits into from
Nov 3, 2020

Conversation

GrabYourPitchforks
Copy link
Member

This is an MQ cleanup work item to remove some of the unsafe code from System.Xml in favor of safer alternatives when such alternatives exist.

There's still quite a bit of unsafe code that can be removed, but this should represent a good first effort.

@ghost
Copy link

ghost commented Oct 14, 2020

Tagging subscribers to this area: @buyaa-n, @krwq, @jeffhandley
See info in area-owners.md if you want to be subscribed.

{
if ((uint)iByte >= (uint)bytes.Length)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is written this way to avoid bounds checks later in the method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GrabYourPitchforks, (maybe you already know it but) since #40180 was merged, the uint cast workaround is not needed in some cases where constant operands are involved. It turned out that in some cases, RyuJIT does not elide the bound check with uint casts, where it was doing before. I posted some findings on macOS here: #11623 (comment). It is unlikely that this construct with (non const) operands is affected by that change, but perhaps would be good to double check with the latest master. I have a feeling that in some places in the framework, we can remove the uint cast, as they are deoptomized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@am11 This is a good observation! As these other optimizations come online I think it'll be useful to perform a libraries-wide sweep of all of these patterns. It's always great to make the code more readable while maintaining peak efficiency. :)

return new string(p, 0, cch);
}
// bitblt source bytes directly into the destination char span
// n.b. source buffer assumed to be well-formed UTF-16 machine endian
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of an expansion on this comment: the original code had two different behaviors based on whether the underlying data was aligned or unaligned. If the underlying data was aligned, the original code would basically memmove the contents of the old buffer into the new string, not performing any UTF-16 validation. If the underlying data was unaligned, the original code would go through Encoding.Unicode.GetString, which would perform UTF-16 validation. This really only affects whether unpaired surrogate characters get replaced with '\uFFFD' in the destination string.

Since the updated code is just a simple memmove, I opted for the "don't perform any validation" behavior.


int cch = dstChars.Length;
ReadOnlySpan<byte> srcBytes = state._data.AsSpan(state.pos, checked(cch * sizeof(char)));
Span<byte> dstBytes = MemoryMarshal.AsBytes(dstChars);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically MemoryMarshal is an unsafe-equivalent API. But at least we got rid of the pointer manipulation in this method. :)

@@ -2890,7 +2859,7 @@ public static string DoubleToString(double dbl)

if (IsInteger(dbl, out iVal))
{
return IntToString(iVal);
return iVal.ToString(CultureInfo.InvariantCulture);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed custom implementation in favor of delegating to the runtime's already-optimized int.ToString implementation.

On .NET 5/6, dereferencing CultureInfo.InvariantCulture is very fast due to recent JIT optimizations.

{
goto Return;
}
}
}

if (pChar < pCharsEndPos && *pChar == '=')
if ((uint)iChar < (uint)chars.Length && chars[iChar] == '=')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to cast it everywhere? Should you just make it unsigned from the start?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the "bounds check elision" pattern used elsewhere in the runtime code base, where the indexer is cast to uint only for the purposes of comparison against Length. All other uses of the indexer remain as normal signed integers.

I'm not a huge fan of this pattern, since I wish we'd just use nuint everywhere and call it a day. :)

Copy link
Member

@krwq krwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some questions

@GrabYourPitchforks
Copy link
Member Author

Latest iteration is just a merge. I'm blocked running unit tests locally or testing other changes until #43137 is resolved.

@GrabYourPitchforks GrabYourPitchforks merged commit 9971745 into dotnet:master Nov 3, 2020
@GrabYourPitchforks GrabYourPitchforks deleted the mq_cleanup branch November 3, 2020 00:24
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants