Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
jtschuster committed Jun 28, 2023
2 parents a16a7d6 + f196a40 commit cca57d3
Show file tree
Hide file tree
Showing 1,259 changed files with 51,676 additions and 20,213 deletions.
2 changes: 1 addition & 1 deletion .config/dotnet-tools.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
]
},
"microsoft.dotnet.xharness.cli": {
"version": "8.0.0-prerelease.23312.1",
"version": "8.0.0-prerelease.23326.1",
"commands": [
"xharness"
]
Expand Down
1 change: 0 additions & 1 deletion Directory.Build.props
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,6 @@
<RepoTasksDir>$([MSBuild]::NormalizeDirectory('$(MSBuildThisFileDirectory)', 'src', 'tasks'))</RepoTasksDir>
<IbcOptimizationDataDir>$([MSBuild]::NormalizeDirectory('$(ArtifactsDir)', 'ibc'))</IbcOptimizationDataDir>
<MibcOptimizationDataDir>$([MSBuild]::NormalizeDirectory('$(ArtifactsDir)', 'mibc'))</MibcOptimizationDataDir>
<XmlDocDir>$([MSBuild]::NormalizeDirectory('$(ArtifactsBinDir)', 'docs'))</XmlDocDir>
<DocsDir>$([MSBuild]::NormalizeDirectory('$(MSBuildThisFileDirectory)', 'docs'))</DocsDir>

<AppleAppBuilderDir>$([MSBuild]::NormalizeDirectory('$(ArtifactsBinDir)', 'AppleAppBuilder', 'Debug', '$(NetCoreAppToolCurrent)'))</AppleAppBuilderDir>
Expand Down
31 changes: 11 additions & 20 deletions Directory.Build.targets
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
<Import Project="$(RepositoryEngineeringDir)liveBuilds.targets" />
<Import Project="$(RepositoryEngineeringDir)generators.targets" />
<Import Project="$(RepositoryEngineeringDir)python.targets" />
<Import Project="$(RepositoryEngineeringDir)generatorProjects.targets" Condition="'$(IsGeneratorProject)' == 'true'" />
<Import Project="$(RepositoryEngineeringDir)resolveContract.targets" Condition="'$(IsSourceProject)' == 'true'" />
<Import Project="$(RepositoryEngineeringDir)packaging.targets" Condition="'$(IsPackable)' == 'true' and '$(MSBuildProjectExtension)' != '.pkgproj'" />

Expand Down Expand Up @@ -115,26 +116,6 @@
</PropertyGroup>
</Target>

<Target Name="GetAnalyzerPackFiles"
DependsOnTargets="$(GenerateNuspecDependsOn)"
Returns="@(_AnalyzerPackFile)">
<PropertyGroup>
<_analyzerPath>analyzers/dotnet</_analyzerPath>
<_analyzerPath Condition="'$(AnalyzerRoslynVersion)' != ''">$(_analyzerPath)/roslyn$(AnalyzerRoslynVersion)</_analyzerPath>
<_analyzerPath Condition="'$(AnalyzerLanguage)' != ''">$(_analyzerPath)/$(AnalyzerLanguage)</_analyzerPath>
</PropertyGroup>

<!-- Filter on netstandard2.0 so that generator projects can multi-target for the purpose of enabling nullable reference type compiler checks. -->
<ItemGroup>
<_AnalyzerPackFile Include="@(_BuildOutputInPackage->WithMetadataValue('TargetFramework', 'netstandard2.0'))" IsSymbol="false" />
<_AnalyzerPackFile Include="@(_TargetPathsToSymbols->WithMetadataValue('TargetFramework', 'netstandard2.0'))" IsSymbol="true" />
<_AnalyzerPackFile PackagePath="$(_analyzerPath)/%(TargetPath)" />
</ItemGroup>

<Error Text="Analyzers must target netstandard2.0 since they run in the compiler which targets netstandard2.0. $(MSBuildProjectFullPath) targets '$([MSBuild]::ValueOrDefault('$(TargetFrameworks)', '$(TargetFramework)'))' instead."
Condition="'@(_AnalyzerPackFile)' == ''" />
</Target>

<!-- Allows building against source assemblies when the 'SkipUseReferenceAssembly' attribute is present on ProjectReference items. -->
<Target Name="HandleReferenceAssemblyAttributeForProjectReferences"
AfterTargets="ResolveProjectReferences"
Expand Down Expand Up @@ -197,6 +178,16 @@
Exclude="@(_targetingPackIncludedReferenceWithProjectName)" />
<Reference Remove="@(_targetingPackExcludedReferenceWithProjectName->Metadata('OriginalIdentity'))" />
</ItemGroup>

<ItemGroup>
<_targetingPackAnalyzerReferenceWithProjectName Include="@(Analyzer->WithMetadataValue('ExternallyResolved', 'true')->Metadata('Filename'))"
OriginalIdentity="%(Identity)" />
<_targetingPackIncludedAnalyzerReferenceWithProjectName Include="@(_targetingPackAnalyzerReferenceWithProjectName)"
Exclude="@(_targetingPackReferenceExclusion)" />
<_targetingPackExcludedAnalyzerReferenceWithProjectName Include="@(_targetingPackAnalyzerReferenceWithProjectName)"
Exclude="@(_targetingPackIncludedAnalyzerReferenceWithProjectName)" />
<Analyzer Remove="@(_targetingPackExcludedAnalyzerReferenceWithProjectName->Metadata('OriginalIdentity'))" />
</ItemGroup>
</Target>

<!--
Expand Down
2 changes: 1 addition & 1 deletion docs/coding-guidelines/project-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ All test outputs should be under

## gen
In the gen directory any source generator related to the assembly should exist. This does not mean the source generator is only used for that assembly only that it is conceptually apart of that assembly. For example, the assembly may provide attributes or low-level types the source generator uses.
To consume a source generator, simply add a `<ProjectReference Include="..." ReferenceOutputAssembly="false" OutputItemType="Analyzer" />` item to the project, usually next to the `Reference` and `ProjectReference` items.
To consume a source generator that isn't provided via a targeting pack, simply add a `<ProjectReference Include="..." ReferenceOutputAssembly="false" OutputItemType="Analyzer" />` item to the project, usually next to the `Reference` and `ProjectReference` items.

A source generator must target `netstandard2.0` as such assemblies are loaded into the compiler's process which might run on either .NET Framework or modern .NET depending on the tooling being used (CLI vs Visual Studio). While that's true, a source project can still multi-target and include `$(NetCoreAppToolCurrent)` (which is the latest non live-built .NETCoreApp tfm that is supported by the SDK) to benefit from the ehancanced nullable reference type warnings emitted by the compiler. For an example see [System.Text.Json's roslyn4.4 source generator](/src/libraries/System.Text.Json/gen/System.Text.Json.SourceGeneration.Roslyn4.4.csproj). While the repository's infrastructure makes sure that only the source generator's `netstandard2.0` build output is included in packages, to consume such a multi-targeting source generator via a `ProjectReference` (as described above), you need to add the `SetTargetFramework="TargetFramework=netstandard2.0"` metadata to the ProjectReference item to guarantee that the netstandard2.0 asset is chosen.

Expand Down
120 changes: 98 additions & 22 deletions docs/design/features/globalization-hybrid-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ new CultureInfo("de-DE").CompareInfo.IndexOf("strasse", "stra\u00DFe", 0, Compar

For OSX platforms we are using native apis instead of ICU data.

**String comparison**
## String comparison

Affected public APIs:
- CompareInfo.Compare,
Expand All @@ -292,44 +292,120 @@ The number of `CompareOptions` and `NSStringCompareOptions` combinations are lim

- `None`:

`CompareOptions.None` is mapped to `NSStringCompareOptions.NSLiteralSearch`
`CompareOptions.None` is mapped to `NSStringCompareOptions.NSLiteralSearch`

There are some behaviour changes. Below are examples of such cases.
There are some behaviour changes. Below are examples of such cases.

| **character 1** | **character 2** | **CompareOptions** | **hybrid globalization** | **icu** | **comments** |
|:---------------:|:---------------:|--------------------|:------------------------:|:-------:|:-------------------------------------------------------:|
| `\u3042`| `\u30A1`| None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u304D\u3083` きゃ | `\u30AD\u30E3` キャ | None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u304D\u3083` きゃ | `\u30AD\u3083` キゃ | None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u3070\u3073\uFF8C\uFF9E\uFF8D\uFF9E\u307C` ばびブベぼ | `\u30D0\u30D3\u3076\u30D9\uFF8E\uFF9E` バビぶベボ | None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u3060`| `\u30C0`| None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u00C0` À | `A\u0300`| None | 1 | 0 | This is not same character for native api |
| **character 1** | **character 2** | **CompareOptions** | **hybrid globalization** | **icu** | **comments** |
|:---------------:|:---------------:|--------------------|:------------------------:|:-------:|:-------------------------------------------------------:|
| `\u3042`| `\u30A1`| None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u304D\u3083` きゃ | `\u30AD\u30E3` キャ | None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u304D\u3083` きゃ | `\u30AD\u3083` キゃ | None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u3070\u3073\uFF8C\uFF9E\uFF8D\uFF9E\u307C` ばびブベぼ | `\u30D0\u30D3\u3076\u30D9\uFF8E\uFF9E` バビぶベボ | None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u3060`| `\u30C0`| None | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |

- `StringSort` :

`CompareOptions.StringSort` is mapped to `NSStringCompareOptions.NSLiteralSearch` .ICU's default is to use "StringSort", i.e. nonalphanumeric symbols come before alphanumeric. That is how works also `NSLiteralSearch`.
`CompareOptions.StringSort` is mapped to `NSStringCompareOptions.NSLiteralSearch` .ICU's default is to use "StringSort", i.e. nonalphanumeric symbols come before alphanumeric. That is how works also `NSLiteralSearch`.

- `IgnoreCase`:

`CompareOptions.IgnoreCase` is mapped to `NSStringCompareOptions.NSCaseInsensitiveSearch | NSStringCompareOptions.NSLiteralSearch`
`CompareOptions.IgnoreCase` is mapped to `NSStringCompareOptions.NSCaseInsensitiveSearch | NSStringCompareOptions.NSLiteralSearch`

There are some behaviour changes. Below are examples of such cases.
There are some behaviour changes. Below are examples of such cases.

| **character 1** | **character 2** | **CompareOptions** | **hybrid globalization** | **icu** | **comments** |
|:---------------:|:---------------:|--------------------|:------------------------:|:-------:|:-------------------------------------------------------:|
| `\u3060`| `\u30C0`| IgnoreCase | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |
| `\u00C0` À | `a\u0300`| IgnoreCase | 1 | 0 | This is related to above mentioned case under `CompareOptions.None` i.e. `\u00C0` À != À `A\u0300` |
| **character 1** | **character 2** | **CompareOptions** | **hybrid globalization** | **icu** | **comments** |
|:---------------:|:---------------:|--------------------|:------------------------:|:-------:|:-------------------------------------------------------:|
| `\u3060`| `\u30C0`| IgnoreCase | 1 | -1 | hiragana and katakana characters are ordered differently compared to ICU |

- `IgnoreNonSpace`:

`CompareOptions.IgnoreNonSpace` is mapped to `NSStringCompareOptions.NSDiacriticInsensitiveSearch | NSStringCompareOptions.NSLiteralSearch`
`CompareOptions.IgnoreNonSpace` is mapped to `NSStringCompareOptions.NSDiacriticInsensitiveSearch | NSStringCompareOptions.NSLiteralSearch`

- `IgnoreWidth`:

`CompareOptions.IgnoreWidth` is mapped to `NSStringCompareOptions.NSWidthInsensitiveSearch | NSStringCompareOptions.NSLiteralSearch`
`CompareOptions.IgnoreWidth` is mapped to `NSStringCompareOptions.NSWidthInsensitiveSearch | NSStringCompareOptions.NSLiteralSearch`

- All combinations that contain below `CompareOptions` always throw `PlatformNotSupportedException`:

`IgnoreSymbols`,
`IgnoreSymbols`,

`IgnoreKanaType`,

## String starts with / ends with

Affected public APIs:
- CompareInfo.IsPrefix
- CompareInfo.IsSuffix
- String.StartsWith
- String.EndsWith

Mapped to Apple Native API `compare:options:range:locale:`(https://developer.apple.com/documentation/foundation/nsstring/1414561-compare?language=objc)
Apple Native API does not expose locale-sensitive endsWith/startsWith function. As a workaround, both strings get normalized and weightless characters are removed. Resulting strings are cut to the same length and comparison is performed. As we are normalizing strings to be able to cut them, we cannot calculate the match length on the original strings. Methods that calculate this information throw PlatformNotSupported exception:

- [CompareInfo.IsPrefix](https://learn.microsoft.com/dotnet/api/system.globalization.compareinfo.isprefix)
- [CompareInfo.IsSuffix](https://learn.microsoft.com/dotnet/api/system.globalization.compareinfo.issuffix)

- `IgnoreSymbols`

As there is no IgnoreSymbols equivalent in NSStringCompareOptions all `CompareOptions` combinations that include `IgnoreSymbols` throw `PlatformNotSupportedException`

## String indexing

Affected public APIs:
- CompareInfo.IndexOf
- CompareInfo.LastIndexOf
- String.IndexOf
- String.LastIndexOf

Mapped to Apple Native API `rangeOfString:options:range:locale:`(https://developer.apple.com/documentation/foundation/nsstring/1417348-rangeofstring?language=objc)

In `rangeOfString:options:range:locale:` objects are compared by checking the Unicode canonical equivalence of their code point sequences.
In cases where search string contains diacritics and has different normalization form than in source string result can be incorrect.

Characters in general are represented by unicode code points, and some characters can be represented in a single code point or by combining multiple characters (like diacritics/diaeresis). Normalization Form C will look to compress characters to their single code point format if they were originally represented as a sequence of multiple code points. Normalization Form D does the opposite and expands characters into their multiple code point formats if possible.

`NSString` `rangeOfString:options:range:locale:` uses canonical equivalence to find the position of the `searchString` within the `sourceString`, however, it does not automatically handle comparison of precomposed (single code point representation) or decomposed (most code points representation). Because the `searchString` and `sourceString` can be of differing formats, to properly find the index, we need to ensure that the searchString is in the same form as the sourceString by checking the `rangeOfString:options:range:locale:` using every single normalization form.

Here are the covered cases with diacritics:
1. Search string contains diacritic and has same normalization form as in source string.
2. Search string contains diacritic but with source string they have same letters with different char lengths but substring is normalized in source.

a. search string `normalizing to form C` is substring of source string. example: search string: `U\u0308` source string: `Source is \u00DC` => matchLength is 1

b. search string `normalizing to form D` is substring of source string. example: search string: `\u00FC` source string: `Source is \u0075\u0308` => matchLength is 2

Not covered case:
Source string's intended substring match containing characters of mixed composition forms cannot be matched by 2. because partial precomposition/decomposition is not performed. example: search string: `U\u0308 and \u00FC` (Ü and ü) source string: `Source is \u00DC and \u0075\u0308` (Source is Ü and ü)
as it is visible from example normalizaing search string to form C or D will not help to find substring in source string.

- `IgnoreSymbols`

As there is no IgnoreSymbols equivalent in NSStringCompareOptions all `CompareOptions` combinations that include `IgnoreSymbols` throw `PlatformNotSupportedException`

- Some letters consist of more than one grapheme.

Apple Native Api does not guarantee that string will be segmented by letters but by graphemes. E.g. in `cs-CZ` and `sk-SK` "ch" is 1 letter, 2 graphemes. The following code with `HybridGlobalization` switched off returns -1 (not found) while with `HybridGlobalization` switched on, it returns 1.

``` C#
new CultureInfo("sk-SK").CompareInfo.IndexOf("ch", "h"); // -1 or 1
```

- Some graphemes have multi-grapheme equivalents.
E.g. in `de-DE` ß (%u00DF) is one letter and one grapheme and "ss" is one letter and is recognized as two graphemes. Apple Native API's equivalent of `IgnoreNonSpace` treats them as the same letter when comparing. Similar case: dz (%u01F3) and dz.

Using `IgnoreNonSpace` for these two with `HybridGlobalization` off, also returns 0 (they are equal). However, the workaround used in `HybridGlobalization` will compare them grapheme-by-grapheme and will return -1.

``` C#
new CultureInfo("de-DE").CompareInfo.IndexOf("strasse", "stra\u00DFe", 0, CompareOptions.IgnoreNonSpace); // 0 or -1
```


## SortKey

Affected public APIs:
- CompareInfo.GetSortKey
- CompareInfo.GetSortKeyLength
- CompareInfo.GetHashCode

`IgnoreKanaType`,
Apple Native API does not have an equivalent, so they throw `PlatformNotSupportedException`.
Loading

0 comments on commit cca57d3

Please sign in to comment.