Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert math intrinsics to named intrinsics #39730

Merged
merged 2 commits into from
Jul 24, 2020
Merged

Convert math intrinsics to named intrinsics #39730

merged 2 commits into from
Jul 24, 2020

Conversation

am11
Copy link
Member

@am11 am11 commented Jul 21, 2020

Contributes to #13820.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 21, 2020
@jkotas
Copy link
Member

jkotas commented Jul 21, 2020

cc @dotnet/jit-contrib

@am11
Copy link
Member Author

am11 commented Jul 22, 2020

x86 timeout failure is #38847.

@am11 am11 marked this pull request as ready for review July 22, 2020 09:07
@jkotas
Copy link
Member

jkotas commented Jul 22, 2020

@dotnet/jit-contrib Any comments?

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking so may have more feedback later.

Have you run diffs?

src/coreclr/src/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/gentree.cpp Outdated Show resolved Hide resolved
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the switch issue, this looks good. I had a couple other small comments.

Would be good to verify this is no diff for both crossgen and PMI across the Fx set, at least for one architecture (crossgen2 perhaps, also?).

Validating this is a bit more complex with a guid change: you will need to run base and diff individually on the base and diff builds, and compare the results afterwards.

src/coreclr/src/jit/gentree.h Show resolved Hide resolved
src/coreclr/src/jit/gentree.h Show resolved Hide resolved
@am11
Copy link
Member Author

am11 commented Jul 22, 2020

Does jit-diff work on macOS? I tried:

$ pushd runtime_base; ./build.sh -c Release ; popd
$ pushd runtime_pr; ./build.sh -c Release ;  ./src/coreclr/build-test.sh -generatelayoutonly -release; popd

$ mkdir /tmp/diffs
$ jitutils/bin/jit-diff diff                                                       \
    --base runtime_base/artifacts/bin/coreclr/OSX.x64.Release                      \
    --diff runtime_pr/artifacts/bin/coreclr/OSX.x64.Release                        \
    --core_root runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root \
    --output /tmp/diffs --pmi

No assemblies specified; defaulting to corelib
Beginning PMI CodeSize Diffs for System.Private.CoreLib.dll
- Finished 0/1 Base 0/1 Diff [0.1 sec]Error running /Users/am11/projects/runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root/corerun on /Users/am11/projects/runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root/System.Private.CoreLib.dll
1 errors compiling set.
Dasm command "jit-dasm-pmi --platform runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --corerun runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root/corerun --jit runtime_base/artifacts/bin/coreclr/OSX.x64.Release/libclrjit.dylib --nocopy --output /tmp/diffs/dasmset_7/base /Users/am11/projects/runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root/System.Private.CoreLib.dll" returned with 1 failures
\ Finished 1/1 Base 1/1 Diff [12.4 sec]
Dasm commands returned 1 base failures, 0 diff failures.
Completed PMI CodeSize Diffs for System.Private.CoreLib.dll in 12.37s
Diffs (if any) can be viewed by comparing: /tmp/diffs/dasmset_7/base /tmp/diffs/dasmset_7/diff
Analyzing CodeSize diffs...
Found 1 files with textual diffs.
PMI CodeSize Diffs for System.Private.CoreLib.dll for  default jit
Summary of Code Size diffs:
(Lower is better)
Warning: the base metric is 0, the diff metric is 0, have you used a release version?
0 total files with Code Size differences (0 improved, 0 regressed), 1 unchanged.
0 total methods with Code Size differences (0 improved, 0 regressed), 0 unchanged.
1 files had text diffs but no metric diffs.
System.Private.CoreLib.dasm had 5 diffs
Completed analysis in 0.19s

$ cat /tmp/diffs/dasmset_7/base/System.Private.CoreLib.dasm 
Fatal error. Failed to load JIT compiler

Disclaimer: this my first time with jit-diff, so i might be missing something rudimentary. 😁

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jul 23, 2020

Since your runtime_pr builds have a different jit guid than your runtime_base builds, you have to do this in steps, as the base and diff its are no longer interchangeable.

I'd try something like

$ jitutils/bin/jit-diff diff                                                       \
    --base --base_root runtime_base                      \
    --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root \
    --output /tmp/diffs --pmi -f

$ jitutils/bin/jit-diff diff                                                       \
    --diff --diff_root runtime_pr                     \
    --core_root runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root \
    --output /tmp/diffs --pmi -f

$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff

and then repeat the above, leaving out --pmi and using a different output dir, to see the crossgen diffs.

[edit: fixed a couple typos in the above]

@am11
Copy link
Member Author

am11 commented Jul 23, 2020

Thanks for the steps @AndyAyersMS. :)

$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff

After taking the diffs of base and pr individually, I ended up with /tmp/diffs/dasmset_1/base (from --base --base_root) and /tmp/diffs/dasmset_2/diff (from --diff --diff_root) directories. When I ran:

$ jitutils/bin/jit-analyze --base /tmp/diffs/dasmset_1/base --diff /tmp/diffs/dasmset_2/diff

it failed with these errors:

Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/CommandLine.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Dia2Lib.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/ILCompiler.Reflection.ReadyToRun.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.CSharp.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.CodeAnalysis.CSharp.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.CodeAnalysis.VisualBasic.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.CodeAnalysis.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.Diagnostics.FastSerialization.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.Diagnostics.NETCore.Client.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.Diagnostics.Tools.RuntimeClient.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.Diagnostics.Tracing.TraceEvent.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.Extensions.Caching.Abstractions.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist
Couldn't parse --numstat output '1	1	/tmp/diffs/{dasmset_2/diff => dasmset_1/base}/Microsoft.Extensions.Caching.Memory.dasm` : '/tmp/diffs/{dasmset_2/diff' does not exist

(note that it is adding a mysterious { in path '/tmp/diffs/{dasmset_2/diff' does not exist)

@AndyAyersMS
Copy link
Member

jit-analyze uses git diff under the covers and sometimes git diff decides there are renames and messes up our parsing.

You should move the diff directory under dasmset_1 and rerun jit-analyze with the new location.

@am11
Copy link
Member Author

am11 commented Jul 23, 2020

Thank you! I first tried creating symlinks manually, but didn't work:

$ ln -s /tmp/diffs/dasmset_2/diff/ /tmp/diffs/diff
$ ln -s /tmp/diffs/dasmset_1/base/ /tmp/diffs/base
$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff
Couldn't parse --numstat output '1	1	/tmp/diffs/{diff => base}` : '/tmp/diffs/base' does not exist

(looks like jit-analyze is strict about directory type and does not accept link-to-directory)

it worked when i moved the directories.

PMI

$ jitutils/bin/jit-diff diff --base runtime_base/artifacts/bin/coreclr/OSX.x64.Release --base_root runtime_base --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --pmi -f
$ jitutils/bin/jit-diff diff --diff runtime_pr/artifacts/bin/coreclr/OSX.x64.Release --diff_root runtime_pr --core_root runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --pmi -f
$ mv /tmp/diffs/dasmset_2/diff/ /tmp/diffs/diff
$ mv /tmp/diffs/dasmset_1/base/ /tmp/diffs/base
$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff
Found 265 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Warning: the base metric is 0, the diff metric is 0, have you used a release version?

0 total files with Code Size differences (0 improved, 0 regressed), 265 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 0 unchanged.

265 files had text diffs but no metric diffs.
System.Security.Cryptography.Algorithms.dasm had 4 diffs
System.Security.Cryptography.OpenSsl.dasm had 4 diffs
Microsoft.Extensions.Primitives.dasm had 2 diffs
System.Xml.Linq.dasm had 2 diffs
System.ComponentModel.Primitives.dasm had 2 diffs
Dia2Lib.dasm had 2 diffs
System.Net.WebSockets.Client.dasm had 2 diffs
System.IO.Compression.FileSystem.dasm had 2 diffs
System.Data.Common.dasm had 2 diffs
System.Threading.Timer.dasm had 2 diffs
System.Security.Cryptography.Cng.dasm had 2 diffs
System.Runtime.Serialization.Primitives.dasm had 2 diffs
System.Security.dasm had 2 diffs
Microsoft.Extensions.Configuration.Abstractions.dasm had 2 diffs
System.Data.DataSetExtensions.dasm had 2 diffs
System.ComponentModel.Composition.Registration.dasm had 2 diffs
System.Net.WebClient.dasm had 2 diffs
System.Globalization.dasm had 2 diffs
Microsoft.Extensions.FileProviders.Abstractions.dasm had 2 diffs
System.Console.dasm had 2 diffs

crossgen:

$ jitutils/bin/jit-diff diff --base runtime_base/artifacts/bin/coreclr/OSX.x64.Release --base_root runtime_base --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --crossgen runtime_base/artifacts/bin/coreclr/OSX.x64.Release/crossgen -f
$ jitutils/bin/jit-diff diff --diff runtime_pr/artifacts/bin/coreclr/OSX.x64.Release --diff_root runtime_pr --core_root runtime_pr/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --crossgen runtime_pr/artifacts/bin/coreclr/OSX.x64.Release/crossgen -f

$ mv /tmp/diffs/dasmset_1/base/ /tmp/diffs/base
$ mv /tmp/diffs/dasmset_2/diff/ /tmp/diffs/diff

$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff

Summary of Code Size diffs:
(Lower is better)

Warning: the base metric is 0, the diff metric is 0, have you used a release version?

0 total files with Code Size differences (0 improved, 0 regressed), 265 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 0 unchanged.

Crossgen2 doesn't have an executable produced on macOS, so I tried with crossgen2.dll, which fails:

$ jitutils/bin/jit-diff diff --diff runtime_base/artifacts/bin/coreclr/OSX.x64.Release --diff_root runtime_base --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --crossgen runtime_base/artifacts/bin/coreclr/OSX.x64.Release/crossgen2/crossgen2.dll -f
Beginning Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies
- Finished 0/0 Base 0/265 Diff [0.1 sec]Unhandled exception. System.ComponentModel.Win32Exception (13): Permission denied
   at System.Diagnostics.Process.ForkAndExecProcess(String filename, String[] argv, String[] envp, String cwd, Boolean redirectStdin, Boolean redirectStdout, Boolean redirectStderr, Boolean setCredentials, UInt32 userId, UInt32 groupId, UInt32[] groups, Int32& stdinFd, Int32& stdoutFd, Int32& stderrFd, Boolean usesTerminal, Boolean throwOnNoExec)
   at System.Diagnostics.Process.StartCore(ProcessStartInfo startInfo)
   at System.Diagnostics.Process.Start()
   at Microsoft.DotNet.Cli.Utils.Command.Execute()

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking the diffs.

@jkotas
Copy link
Member

jkotas commented Jul 23, 2020

the base metric is 0, the diff metric is 0, have you used a release version?

This message looks suspect. Was there actually disassembly found in the diff files?

@AndyAyersMS
Copy link
Member

This message looks suspect. Was there actually disassembly found in the diff files?

Ah, good point. Please don't pass a path to --base or --diff.

instead of this:

$ jitutils/bin/jit-diff diff --base runtime_base/artifacts/bin/coreclr/OSX.x64.Release --base_root runtime_base --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --pmi -f

run this:

$ jitutils/bin/jit-diff diff --base --base_root runtime_base --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root --output /tmp/diffs --pmi -f

and likewise for the diff runs.

You will also need to have checked builds of the runtime for both repos.

@am11
Copy link
Member Author

am11 commented Jul 23, 2020

With:

$ jitutils/bin/jit-diff diff                                                         \
    --base                                                                           \
    --base_root runtime_base                                                         \
    --core_root runtime_base/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root \
    --output /tmp/diffs --pmi -f

it complains:

error: didn't find --base default
error: Specify either --base or --diff or both.

You will also need to have checked builds of the runtime for both repos.

I will do that now.

@am11
Copy link
Member Author

am11 commented Jul 23, 2020

It worked without --base and --diff values with checked builds. :)

$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff
Found 278 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 360 (0.001% of base)
    diff is a regression.

Top file regressions (bytes):
         138 : System.Private.CoreLib.dasm (0.003% of base)
          79 : System.Private.Xml.dasm (0.002% of base)
          57 : System.Data.Common.dasm (0.004% of base)
          52 : Newtonsoft.Json.dasm (0.006% of base)
          17 : System.ComponentModel.TypeConverter.dasm (0.006% of base)
           6 : Microsoft.VisualBasic.Core.dasm (0.001% of base)
           4 : System.Runtime.Caching.dasm (0.007% of base)
           4 : System.Drawing.Primitives.dasm (0.011% of base)
           3 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.000% of base)
           3 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.000% of base)

Top file improvements (bytes):
          -1 : System.Drawing.Common.dasm (-0.000% of base)
          -1 : ILCompiler.Reflection.ReadyToRun.dasm (-0.001% of base)
          -1 : System.Net.Mail.dasm (-0.001% of base)

13 total files with Code Size differences (3 improved, 10 regressed), 252 unchanged.

Top method regressions (bytes):
          80 (3.120% of base) : System.Private.Xml.dasm - BigNumber:DblToRgbFast(double,ref,byref,byref):bool
          60 (16.129% of base) : System.Private.CoreLib.dasm - MemoryFailPoint:.ctor(int):this
          57 (7.692% of base) : System.Data.Common.dasm - SqlDecimal:.ctor(double):this
          31 (140.909% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:Reminder(double,double):double
          27 (122.727% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:FloatingPointRemainder(double,double):double
          25 (2.660% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:ValidateFloat(JsonSchemaModel):this
          22 (4.175% of base) : System.Private.CoreLib.dasm - Vector`1:Ceiling(Vector`1):Vector`1 (6 methods)
          22 (4.175% of base) : System.Private.CoreLib.dasm - Vector`1:Floor(Vector`1):Vector`1 (6 methods)
          18 (37.500% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:NormalizeLongitude(double):double
          11 (18.644% of base) : System.ComponentModel.TypeConverter.dasm - Timer:UpdateTimer():this
           8 (1.023% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(Object):Object
           6 (0.766% of base) : System.Private.CoreLib.dasm - Math:Round(double,int,int):double
           6 (0.762% of base) : System.Private.CoreLib.dasm - MathF:Round(float,int,int):float
           4 (2.010% of base) : System.Runtime.Caching.dasm - MemoryCacheStore:TrimInternal(int):long:this
           4 (22.222% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(float):float
           3 (0.492% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - CompileTimeCalculations:ConvertFloatingValue(double,ubyte,byref):ConstantValue
           3 (0.718% of base) : System.ComponentModel.TypeConverter.dasm - Timer:set_Enabled(bool):this
           3 (0.783% of base) : System.ComponentModel.TypeConverter.dasm - Timer:.ctor(double):this
           3 (0.177% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ThreadState:LogBlockingEnd(double,int,int,TraceThread,ThreadTimeStackComputer):this
           2 (2.667% of base) : System.Private.Xml.dasm - NumberFunctions:Floor(XPathNodeIterator):double:this

Top method improvements (bytes):
         -15 (-0.747% of base) : System.Private.CoreLib.dasm - Number:Dragon4(long,int,int,bool,int,bool,Span`1,byref):int
          -8 (-3.922% of base) : System.Private.CoreLib.dasm - TimeSpanToken:NormalizeAndValidateFraction():bool:this
          -5 (-1.323% of base) : System.Private.Xml.dasm - NumberingFormat:ConvertToArabic(double,int,int,String):String
          -2 (-3.509% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Fix(double):double
          -2 (-2.740% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Fix(float):float
          -2 (-20.000% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(double):double
          -1 (-2.083% of base) : System.Drawing.Common.dasm - Font:get_Height():int:this
          -1 (-0.067% of base) : ILCompiler.Reflection.ReadyToRun.dasm - GcInfo:GetTransitions(ref,byref):Dictionary`2:this
          -1 (-0.361% of base) : System.Net.Mail.dasm - SmtpDateTime:TimeSpanToOffset(TimeSpan):String:this
          -1 (-0.559% of base) : System.Private.CoreLib.dasm - Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):DiyFp
          -1 (-0.885% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:GetGregorianYear(double):int
          -1 (-0.435% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:PersianNewYearOnOrBefore(long):long
          -1 (-0.252% of base) : System.Private.CoreLib.dasm - PersianCalendar:GetDatePart(long,int):int:this

Top method regressions (percentages):
          31 (140.909% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:Reminder(double,double):double
          27 (122.727% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:FloatingPointRemainder(double,double):double
          18 (37.500% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:NormalizeLongitude(double):double
           4 (22.222% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(float):float
          11 (18.644% of base) : System.ComponentModel.TypeConverter.dasm - Timer:UpdateTimer():this
          60 (16.129% of base) : System.Private.CoreLib.dasm - MemoryFailPoint:.ctor(int):this
          57 (7.692% of base) : System.Data.Common.dasm - SqlDecimal:.ctor(double):this
          22 (4.175% of base) : System.Private.CoreLib.dasm - Vector`1:Ceiling(Vector`1):Vector`1 (6 methods)
          22 (4.175% of base) : System.Private.CoreLib.dasm - Vector`1:Floor(Vector`1):Vector`1 (6 methods)
          80 (3.120% of base) : System.Private.Xml.dasm - BigNumber:DblToRgbFast(double,ref,byref,byref):bool
           2 (2.667% of base) : System.Private.Xml.dasm - NumberFunctions:Floor(XPathNodeIterator):double:this
           2 (2.667% of base) : System.Private.Xml.dasm - NumberFunctions:Ceiling(XPathNodeIterator):double:this
          25 (2.660% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:ValidateFloat(JsonSchemaModel):this
           4 (2.010% of base) : System.Runtime.Caching.dasm - MemoryCacheStore:TrimInternal(int):long:this
           2 (1.613% of base) : System.Drawing.Primitives.dasm - Rectangle:Ceiling(RectangleF):Rectangle
           1 (1.587% of base) : System.Drawing.Primitives.dasm - Point:Ceiling(PointF):Point
           1 (1.587% of base) : System.Drawing.Primitives.dasm - Size:Ceiling(SizeF):Size
           8 (1.023% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(Object):Object
           3 (0.783% of base) : System.ComponentModel.TypeConverter.dasm - Timer:.ctor(double):this
           6 (0.766% of base) : System.Private.CoreLib.dasm - Math:Round(double,int,int):double

Top method improvements (percentages):
          -2 (-20.000% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Int(double):double
          -8 (-3.922% of base) : System.Private.CoreLib.dasm - TimeSpanToken:NormalizeAndValidateFraction():bool:this
          -2 (-3.509% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Fix(double):double
          -2 (-2.740% of base) : Microsoft.VisualBasic.Core.dasm - Conversion:Fix(float):float
          -1 (-2.083% of base) : System.Drawing.Common.dasm - Font:get_Height():int:this
          -5 (-1.323% of base) : System.Private.Xml.dasm - NumberingFormat:ConvertToArabic(double,int,int,String):String
          -1 (-0.885% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:GetGregorianYear(double):int
         -15 (-0.747% of base) : System.Private.CoreLib.dasm - Number:Dragon4(long,int,int,bool,int,bool,Span`1,byref):int
          -1 (-0.559% of base) : System.Private.CoreLib.dasm - Grisu3:GetCachedPowerForBinaryExponentRange(int,int,byref):DiyFp
          -1 (-0.435% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:PersianNewYearOnOrBefore(long):long
          -1 (-0.361% of base) : System.Net.Mail.dasm - SmtpDateTime:TimeSpanToOffset(TimeSpan):String:this
          -1 (-0.252% of base) : System.Private.CoreLib.dasm - PersianCalendar:GetDatePart(long,int):int:this
          -1 (-0.067% of base) : ILCompiler.Reflection.ReadyToRun.dasm - GcInfo:GetTransitions(ref,byref):Dictionary`2:this

37 total methods with Code Size differences (13 improved, 24 regressed), 236438 unchanged.

@am11
Copy link
Member Author

am11 commented Jul 23, 2020

e.g. the regression in CalendricalCalculationsHelper:Reminder looks like this:

click to expand
-; Total bytes of code 22, prolog size 3, PerfScore 29.50, (MethodHash=3ec4c25b) for method CalendricalCalculationsHelper:Reminder(double,double):double
+; Total bytes of code 53, prolog size 7, PerfScore 32.50, (MethodHash=3ec4c25b) for method CalendricalCalculationsHelper:Reminder(double,double):double
 ; ============================================================
 
 Unwind Info:
@@ -906600,11 +906625,12 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x00
-  CountOfUnwindCodes: 0
+  SizeOfProlog      : 0x04
+  CountOfUnwindCodes: 1
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
+    CodeOffset: 0x04 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 2 * 8 + 8 = 24 = 0x18
 ; Assembly listing for method CalendricalCalculationsHelper:NormalizeLongitude(double):double
 ; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
 ; optimized code
@@ -906612,30 +906638,37 @@ Unwind Info:
 ; partially interruptible
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T00] (  9,  8   )  double  ->  mm0        
+;  V00 arg0         [V00,T00] (  9,  8   )  double  ->  [rsp+0x00]  
 ;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
-;  V02 tmp1         [V02,T01] (  2,  2   )  double  ->  mm1         "Inline stloc first use temp"
+;  V02 tmp1         [V02,T01] (  2,  2   )  double  ->  mm0         "Inline stloc first use temp"
 ;
-; Lcl frame size = 0
+; Lcl frame size = 8
 
 G_M36341_IG01:
+       push     rax
        vzeroupper 
-						;; bbWeight=1    PerfScore 1.00
+						;; bbWeight=1    PerfScore 2.00
 G_M36341_IG02:
-       vdivsd   xmm1, xmm0, qword ptr [reloc @RWD08]
-       vroundsd xmm1, xmm1, 9
-       vmulsd   xmm1, xmm1, qword ptr [reloc @RWD24]
-       vsubsd   xmm0, xmm0, xmm1
-       vxorps   xmm1, xmm1
-       vucomisd xmm1, xmm0
+       vmovsd   qword ptr [rsp], xmm0
+       vdivsd   xmm0, xmm0, qword ptr [reloc @RWD08]
+       call     Math:Floor(double):double
+       vmulsd   xmm0, xmm0, qword ptr [reloc @RWD24]
+       vmovsd   xmm1, qword ptr [rsp]
+       vsubsd   xmm1, xmm1, xmm0
+       vxorps   xmm0, xmm0
+       vucomisd xmm0, xmm1
        jbe      SHORT G_M36341_IG04
-						;; bbWeight=1    PerfScore 31.33
+						;; bbWeight=1    PerfScore 27.83
 G_M36341_IG03:
-       vaddsd   xmm0, xmm0, qword ptr [reloc @RWD40]
+       vaddsd   xmm1, xmm1, qword ptr [reloc @RWD40]
 						;; bbWeight=0.50 PerfScore 2.50
 G_M36341_IG04:
+       vmovaps  xmm0, xmm1
+						;; bbWeight=1    PerfScore 0.25
+G_M36341_IG05:
+       add      rsp, 8
        ret      
-						;; bbWeight=1    PerfScore 1.00
+						;; bbWeight=1    PerfScore 1.25
 RWD00  dq	0000000000000000h
 RWD08  dq	4076800000000000h
 RWD16  dq	0000000000000000h
@@ -906644,7 +906677,7 @@ RWD32  dq	0000000000000000h
 RWD40  dq	4076800000000000h

@jkotas
Copy link
Member

jkotas commented Jul 23, 2020

This change is expected to produce zero-diff. These diffs are unexpected and suggest that there is a bug somewhere. Could you please look into it?

@EgorBo
Copy link
Member

EgorBo commented Jul 23, 2020

Judging by the regression diff, Math.Floor wasn't expanded to sse41 rounds as expected.
However, should it? for crossgen2

@am11
Copy link
Member Author

am11 commented Jul 23, 2020

Following five functions are not recognized as intrinsic, i.e. they do not even reach lookupNamedIntrinsic in the importer:

  • MathF.Abs
  • MathF.Floor
  • MathF.Ceiling
  • Math.Floor
  • Math.Ceiling

Based on the previous debugging, i think MathF.Abs is not recognized as intrinsic by JIT even in the master branch. e.g. you can print something under all case CORINFO_INTRINSIC_Abs cases and run a simplest program with MathF.Abs(1.2f) call, it will never show up. and Math.Abs(1.2) (without both f's) will. I didn't quite figure out the exact reason, but it seems like SetIsJitIntrinsic does get called by genmeth / methodtablebuilder for MathF.Abs, but by the time control reaches here:

if (pMD->IsJitIntrinsic())
it returns false, and importer does not classify MathF.Abs as intrinsic.

Investigating floor and ceiling cases, which are new.

@am11
Copy link
Member Author

am11 commented Jul 24, 2020

By marking floor and ceiling methods [Intrinsic] on managed side, we do not need to modify the conditions in importer's impIntrinsic(...) function, which expects the method to have CORINFO_FLG_JIT_INTRINSIC flag, in addition to CORINFO_FLG_INTRINSIC.

@EgorBo, so far what i have understood is (still trying to digest the flow): if an [Intrinsic] member is also recorded as hwintrinsic, then best optimization is applied depending on JIT or crossgen. So if before this change, Floor was a hwintrinsic, an FCall intrinsic (CORINFO_FLG_INTRINSIC) and not the JIT intrinsic (CORINFO_FLG_JIT_INTRINSIC, which gets assigned to [Intrinsic] decorated members); changing it to have CORINFO_FLG_JIT_INTRINSIC flag as well will not take away its ability to get optimized as SSE4x hwintrinsic. It may not hold true for other cases, as i have only paid attention to Floor and Ceiling. :)

@jkotas, @AndyAyersMS, we have zero code diffs now. :)

PMI

$ jitutils/bin/jit-analyze --base /tmp/diffs/base --diff /tmp/diffs/diff
Found 266 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 0 (0.000% of base)

0 total files with Code Size differences (0 improved, 0 regressed), 265 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 236475 unchanged.

1 files had text diffs but no metric diffs.
System.Private.CoreLib.dasm had 76 diffs
Expand to see System.Private.CoreLib.dasm textual diffs
diff -u --recursive /tmp/diffs/base/System.Private.CoreLib.dasm /tmp/diffs/diff/System.Private.CoreLib.dasm
--- /tmp/diffs/base/System.Private.CoreLib.dasm	2020-07-24 13:34:42.000000000 +0300
+++ /tmp/diffs/diff/System.Private.CoreLib.dasm	2020-07-24 13:39:02.000000000 +0300
@@ -174935,6 +174935,44 @@
     CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
     CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
     CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
+; Assembly listing for method SZArrayHelper:IndexOf(ubyte):int:this
+; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
+; optimized code
+; rsp based frame
+; fully interruptible
+; Final local variable assignments
+;
+;  V00 this         [V00,T00] (  4,  4   )     ref  ->  rdi         this class-hnd
+;  V01 arg1         [V01,T01] (  3,  3   )   ubyte  ->  rsi        
+;* V02 loc0         [V02    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
+;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
+;
+; Lcl frame size = 0
+
+G_M11338_IG01:
+						;; bbWeight=1    PerfScore 0.00
+G_M11338_IG02:
+       mov      ecx, dword ptr [rdi+8]
+       movzx    rsi, sil
+       xor      edx, edx
+						;; bbWeight=1    PerfScore 2.50
+G_M11338_IG03:
+       jmp      Array:IndexOf(ref,ubyte,int,int):int
+						;; bbWeight=1    PerfScore 2.00
+
+; Total bytes of code 14, prolog size 0, PerfScore 5.90, (MethodHash=17a0d3b5) for method SZArrayHelper:IndexOf(ubyte):int:this
+; ============================================================
+
+Unwind Info:
+  >> Start offset   : 0x000000 (not in unwind data)
+  >>   End offset   : 0xd1ffab1e (not in unwind data)
+  Version           : 1
+  Flags             : 0x00
+  SizeOfProlog      : 0x00
+  CountOfUnwindCodes: 0
+  FrameRegister     : none (0)
+  FrameOffset       : N/A (no FrameRegister) (Value=0)
+  UnwindCodes       :
 ; Assembly listing for method TlsOverPerCoreLockedStacksArrayPool`1:Gen2GcCallbackFunc(Object):bool
 ; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
 ; optimized code
@@ -176033,44 +176071,6 @@
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
     CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
-; Assembly listing for method SZArrayHelper:IndexOf(ubyte):int:this
-; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
-; optimized code
-; rsp based frame
-; fully interruptible
-; Final local variable assignments
-;
-;  V00 this         [V00,T00] (  4,  4   )     ref  ->  rdi         this class-hnd
-;  V01 arg1         [V01,T01] (  3,  3   )   ubyte  ->  rsi        
-;* V02 loc0         [V02    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
-;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
-;
-; Lcl frame size = 0
-
-G_M11338_IG01:
-						;; bbWeight=1    PerfScore 0.00
-G_M11338_IG02:
-       mov      ecx, dword ptr [rdi+8]
-       movzx    rsi, sil
-       xor      edx, edx
-						;; bbWeight=1    PerfScore 2.50
-G_M11338_IG03:
-       jmp      Array:IndexOf(ref,ubyte,int,int):int
-						;; bbWeight=1    PerfScore 2.00
-
-; Total bytes of code 14, prolog size 0, PerfScore 5.90, (MethodHash=17a0d3b5) for method SZArrayHelper:IndexOf(ubyte):int:this
-; ============================================================
-
-Unwind Info:
-  >> Start offset   : 0x000000 (not in unwind data)
-  >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Version           : 1
-  Flags             : 0x00
-  SizeOfProlog      : 0x00
-  CountOfUnwindCodes: 0
-  FrameRegister     : none (0)
-  FrameOffset       : N/A (no FrameRegister) (Value=0)
-  UnwindCodes       :
 ; Assembly listing for method SZArrayHelper:IndexOf(short):int:this
 ; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
 ; optimized code

crossgen

$ jitutils/bin/jit-analyze --base /tmp/diffs/base1 --diff /tmp/diffs/diff1

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 0 (0.000% of base)

0 total files with Code Size differences (0 improved, 0 regressed), 265 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 182131 unchanged.

@jkotas
Copy link
Member

jkotas commented Jul 24, 2020

Looks great! Thank you!

@jkotas jkotas merged commit 4f38b75 into dotnet:master Jul 24, 2020
@AndyAyersMS
Copy link
Member

@am11 sorry for the rough edges running diffs. Need to find some time to work on dotnet/jitutils#217.

FYI @dotnet/jit-contrib -- another GUID change.

@am11 am11 deleted the feature/math/convert-to-named-intrinsic branch July 25, 2020 09:24
Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020
* Convert math intrinsics to named intrinsics

* Annotate Floor and Ceiling with [Intrinsic]
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants