Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a new NumberStyles option to stop parsing after the first invalid character #87171

Open
tannergooding opened this issue Jun 6, 2023 · 3 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics
Milestone

Comments

@tannergooding
Copy link
Member

Summary

Up through .NET 7 we have the provided UTF-16 based parsing functionality in the form of various Parse and TryParse APIs exposed as static methods on the respective types. In .NET 7 this support was expanded via the new IParsable and ISpanParsable interface and some additional numeric specific APIs on INumberBase that take a NumberStyles parameter.

We have also provided Utf8Parser which gives access to UTF-8 based parsing functionality for a limited subset of scenarios and which has a core differing behavior in that rather than treating the input as invalid after the first invalid character, it is instead treated as "end of input".

In .NET 8, we have again expanded the core support to provide IUtf8SpanParsable and some additional methods on INumberBase which provides parity to the UTF-16 scenarios. However, there still remains a divergence in that UTF8Parser still supports treating the first invalid character specially and there is no equivalent functionality for UTF-16 or for UTF-8 when using the new interfaces/APIs.

It is proposed that we then provide a new NumberStyles option that allows access to this functionality for UTF-16 or UTF-8 scenarios via the new interfaces/types. This would make Utf8Parser and Utf8Formatter both "functionally obsolete" even if we do not actually mark them as such.

API Proposal

namespace System.Globalization
{
    [Flags]
    public partial enum NumberStyles
    {
        AllowTrailingInvalidCharacters = 0x800
    }
}

namespace System.Numerics
{
    public static virtual bool TryParse([NotNullWhen(true)] string? s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result, out int charsConsumed);

    public static virtual bool TryParse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result, out int charsConsumed);
    
    public static virtual bool TryParse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result, out int bytesConsumed);
}

We would then remove api-approved from proposals such as #73842 given that it provides no new functionality. -- Even if this proposal is rejected, there needs to be a determination if APIs are exposed on Utf8Formatter given that it is effectively just a mirror for Utf8Parser and doesn't actually provide access to any functionality that can not be done using IUtf8SpanFormattable

Given that the current requirement is that unsupported/unrecognized NumberStyles throw an ArgumentException, we can reasonably provide a default implementation since the only valid bytesConsumed/charsConsumed will be "everything" or "nothing". The implementor would be expected to override the DIM if and when they add support for AllowTrailingInvalidCharacters

Alternative Names

There are several potential other names that we could choose from, ranging in verbosity and total clarity on what's being allowed:

  • AllowTrailingCharacters
  • AllowTrailingAny
  • AllowInvalidCharacterForEndOfString
  • StopOnFirstInvalidCharacter
  • etc
@tannergooding tannergooding added area-System.Numerics api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 6, 2023
@tannergooding tannergooding added this to the 8.0.0 milestone Jun 6, 2023
@ghost
Copy link

ghost commented Jun 6, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Summary

Up through .NET 7 we have the provided UTF-16 based parsing functionality in the form of various Parse and TryParse APIs exposed as static methods on the respective types. In .NET 7 this support was expanded via the new IParsable and ISpanParsable interface and some additional numeric specific APIs on INumberBase that take a NumberStyles parameter.

We have also provided Utf8Parser which gives access to UTF-8 based parsing functionality for a limited subset of scenarios and which has a core differing behavior in that rather than treating the input as invalid after the first invalid character, it is instead treated as "end of input".

In .NET 8, we have again expanded the core support to provide IUtf8SpanParsable and some additional methods on INumberBase which provides parity to the UTF-16 scenarios. However, there still remains a divergence in that UTF8Parser still supports treating the first invalid character specially and there is no equivalent functionality for UTF-16 or for UTF-8 when using the new interfaces/APIs.

It is proposed that we then provide a new NumberStyles option that allows access to this functionality for UTF-16 or UTF-8 scenarios via the new interfaces/types. This would make Utf8Parser and Utf8Formatter both "functionally obsolete" even if we do not actually mark them as such.

API Proposal

namespace System.Globalization
{
    [Flags]
    public partial enum NumberStyles
    {
        AllowTrailingInvalidCharacters = 0x800
    }
}

namespace System.Numerics
{
    public static virtual bool TryParse([NotNullWhen(true)] string? s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result, out int charsConsumed);

    public static virtual bool TryParse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result, out int charsConsumed);
    
    public static virtual bool TryParse(ReadOnlySpan<byte> s, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result, out int bytesConsumed);
}

We would then remove api-approved from proposals such as #73842 given that it provides no new functionality. -- Even if this proposal is rejected, there needs to be a determination if APIs are exposed on Utf8Formatter given that it is effectively just a mirror for Utf8Parser and doesn't actually provide access to any functionality that can not be done using IUtf8SpanFormattable

Given that the current requirement is that unsupported/unrecognized NumberStyles throw an ArgumentException, we can reasonably provide a default implementation since the only valid bytesConsumed/charsConsumed will be "everything" or "nothing". The implementor would be expected to override the DIM if and when they add support for AllowTrailingInvalidCharacters

Alternative Names

There are several potential other names that we could choose from, ranging in verbosity and total clarity on what's being allowed:

  • AllowTrailingCharacters
  • AllowTrailingAny
  • AllowInvalidCharacterForEndOfString
  • StopOnFirstInvalidCharacter
  • etc
Author: tannergooding
Assignees: -
Labels:

area-System.Numerics, api-ready-for-review

Milestone: 8.0.0

@terrajobst
Copy link
Member

terrajobst commented Jun 20, 2023

Video

  • Looks good as proposed
  • For the default implementation we have three options:
    1. TryParse throws for AllowTrailingInvalidCharacters
    2. TryParse delegates to Parse masking off AllowTrailingInvalidCharacters
    3. TryParse delegates to Parse without masking and assumes Parse will throw when it doesn't understand AllowTrailingInvalidCharacters
  • We decided for option (3)
namespace System.Globalization
{
    [Flags]
    public partial enum NumberStyles
    {
        AllowTrailingInvalidCharacters = 0x800
    }
}

namespace System.Numerics
{
    public interface INumberBase<TSelf>
    {
        public static virtual bool TryParse([NotNullWhen(true)] string? s,
                                            NumberStyles style,
                                            IFormatProvider? provider,
                                            [MaybeNullWhen(false)] out TSelf result,
                                            out int charsConsumed);
        public static virtual bool TryParse(ReadOnlySpan<char> s,
                                            NumberStyles style,
                                            IFormatProvider? provider,
                                            [MaybeNullWhen(false)] out TSelf result,
                                            out int charsConsumed); 
        public static virtual bool TryParse(ReadOnlySpan<byte> s,
                                            NumberStyles style,
                                            IFormatProvider? provider,
                                            [MaybeNullWhen(false)] out TSelf result,
                                            out int bytesConsumed);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 20, 2023
@tannergooding tannergooding modified the milestones: 8.0.0, Future Jul 24, 2023
@tannergooding
Copy link
Member Author

Not going to have time to land this one for .NET 8, we'll be able to take a PR for it anytime after main opens for .NET 9 changes next month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics
Projects
None yet
Development

No branches or pull requests

2 participants