Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup BitArray code #38780

Closed
wants to merge 1 commit into from
Closed

Conversation

Gnbrkm41
Copy link
Contributor

@Gnbrkm41 Gnbrkm41 commented Jul 4, 2020

This PR addresses some of the comments raised during the initial PR of the vectorised BitArray code, such as using new API/intrinsics. Also highlights a couple potential improvements.

cc @kunalspathak @tannergooding @echesakovMSFT (ARM intrinsics related people I can remember from the top of my head)

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 4, 2020
@Gnbrkm41
Copy link
Contributor Author

Gnbrkm41 commented Jul 4, 2020

Are the libraries test pipelines w/ ARM machines still run manually? It would be great if someone can trigger the pipeline manually if those aren't run for PR evaluations (I remember them not running by default some time ago).

@marek-safar marek-safar added area-System.Collections and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jul 5, 2020
@ghost
Copy link

ghost commented Jul 5, 2020

Tagging subscribers to this area: @eiriktsarpalis
Notify danmosemsft if you want to be subscribed.

@kunalspathak
Copy link
Member

Are the libraries test pipelines w/ ARM machines still run manually? It would be great if someone can trigger the pipeline manually if those aren't run for PR evaluations (I remember them not running by default some time ago).

I thought they were automatically triggered if anything under libraries is touched. @safern , isn't that not the case? I don't see libraries run for arm64.

@safern
Copy link
Member

safern commented Jul 6, 2020

We had to remove them from PRs because of the reduced hardware availability we have, however we do run them against a checked coreclr when coreclr is touched, so if you don't want to go through the manual trigger dance, you can push a dummy commit that changes any file under coreclr.

@kunalspathak
Copy link
Member

Thanks @safern for confirming. I have triggered a run manually.

@Gnbrkm41
Copy link
Contributor Author

Gnbrkm41 commented Jul 7, 2020

Seeing some failing System.Net tests; those seem unrelated? the arm64 tests didn't seem to run as a result of those failures.

@kunalspathak
Copy link
Member

Seeing some failing System.Net tests; those seem unrelated? the arm64 tests didn't seem to run as a result of those failures.

I just triggered another run.

@Gnbrkm41
Copy link
Contributor Author

Gnbrkm41 commented Jul 7, 2020

Failing tests are #38805 and #38847. It looks like existing tests did succeed on ARM.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding
Copy link
Member

@Gnbrkm41, this looks good. Could we have a disassembly and/or perf numbers hsowing the improvement, however?

@Gnbrkm41
Copy link
Contributor Author

Some numbers:

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio MannWhitney(3ms) Gen 0 Gen 1 Gen 2 Allocated
BitArrayNot Job-UUOUFD /after/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 4 26.95 ns 0.026 ns 0.023 ns 26.95 ns 26.92 ns 26.99 ns 1.00 Same - - - -
BitArrayNot Job-OIAKHU /before/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 4 26.96 ns 0.020 ns 0.019 ns 26.96 ns 26.93 ns 27.00 ns 1.00 Base - - - -
BitArrayCopyToBoolArray Job-UUOUFD /after/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 4 305.92 ns 0.171 ns 0.152 ns 305.87 ns 305.77 ns 306.15 ns 1.00 Same - - - -
BitArrayCopyToBoolArray Job-OIAKHU /before/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 4 305.94 ns 0.254 ns 0.199 ns 305.92 ns 305.69 ns 306.27 ns 1.00 Base - - - -
BitArrayNot Job-UUOUFD /after/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 512 371.29 ns 0.127 ns 0.106 ns 371.30 ns 371.14 ns 371.48 ns 1.00 Same - - - -
BitArrayNot Job-OIAKHU /before/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 512 370.65 ns 0.209 ns 0.175 ns 370.60 ns 370.47 ns 371.08 ns 1.00 Base - - - -
BitArrayCopyToBoolArray Job-UUOUFD /after/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 512 5,887.23 ns 6.308 ns 5.900 ns 5,886.91 ns 5,879.27 ns 5,898.81 ns 1.00 Same - - - -
BitArrayCopyToBoolArray Job-OIAKHU /before/net5.0-Linux-Release-arm64/shared/Microsoft.NETCore.App/5.0.0/corerun 512 5,892.82 ns 4.715 ns 4.411 ns 5,891.77 ns 5,887.41 ns 5,901.29 ns 1.00 Base - - - -

Ran on my RPi 3b+; Not really seeing any huge problems standing out...

@Gnbrkm41 Gnbrkm41 force-pushed the bitarraycleanup branch 2 times, most recently from 437cbb0 to 017fe6d Compare August 14, 2020 09:14
@Gnbrkm41
Copy link
Contributor Author

Finally got around to get the assembly output.
dotnet-after.txt
dotnet-before.txt

Not exactly sure why I'm seeing Debug.Fail calls; thought they're supposed to be not included when built with --librariesConfiguration Release?

@kunalspathak
Copy link
Member

Finally got around to get the assembly output.
dotnet-after.txt
dotnet-before.txt

Not exactly sure why I'm seeing Debug.Fail calls; thought they're supposed to be not included when built with --librariesConfiguration Release?

Thanks for getting the diffs. I don't see that you have added Debug.Fail() but it still shows up in dotnet-after.txt. Also, how did you collect dotnet-before.txt? Was it taken with the changes in your PR reverted? Because I don't see Debug.Fail() in that.

@Gnbrkm41
Copy link
Contributor Author

Gnbrkm41 commented Aug 14, 2020

I created a regular console app that calls into BitArray.Not and BitArray.CopyTo, built it under Release config, and copied the DLL over to my RPi. Then, on my WSL2 Ubuntu 20.04 image, I did clean build off the main branch using docker run --rm -a STDOUT -a STDIN -a STDERR -v /home/gotos/source/repos/runtime:/runtime -w /runtime -e ROOTFS_DIR=/crossrootfs/arm64 mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-cross-arm64-20200413125008-cfdd435 ./build.sh --build --restore --os Linux --arch arm64 --runtimeConfiguration Checked --librariesConfiguration Release --subset Clr+Libs --cross, cleaned the artifacts folder again using ./build.sh -clean, moved to the branch with the changes (rebased on top of the main branch) then re-ran the same build command. I copied the built runtime by copying the net5.0-Linux-arm64-Release (not exactly sure about the name) After that, I did:

export COMPlus_JitDisasm="BitArray::Not BitArray::CopyTo
export COMPlus_JitDiffableDasm=1
export COMPlus_TieredCompliation=0
./shared/Microsoft.NETCore.App/5.0.0/corerun Path-To-Dll.dll

Perhaps it's an inlining thing? I can obtain the JitDump for both builds if it helps.

@kunalspathak
Copy link
Member

Sure, can you attach Jitdump them as well? Want to see where it is coming from.

@Gnbrkm41
Copy link
Contributor Author

Had to compress because the raw dumps were totalling 22MB.
jitdumps.zip

@Gnbrkm41
Copy link
Contributor Author

Gnbrkm41 commented Aug 14, 2020

I don't think they're only present in one of them? I find exactly three occurences of Debug.Fail in both disasms. They appear to be coming from Debug.Asserts from various places which is getting inlined for it being "profitable inline". Still, I find it weird they show up even with --librariesConfiguration Release...

@kunalspathak
Copy link
Member

I don't think they're only present in one of them? I find exactly three occurences of Debug.Fail in both disasms. They appear to be coming from Debug.Asserts from various places which is getting inlined for it being "profitable inline". Still, I find it weird they show up even with --librariesConfiguration Release...

Interesting. I don't see "Debug.Fail()" anywhere in the asms that you shared. I am guessing the sequence is for Debug.Fail() and I see same number of 0xd1ffab1e in both asms, so could be they are around in both asms.

            movz    x0, #0xd1ffab1e
            movk    x0, #0xd1ffab1e LSL #16
            movk    x0, #0xd1ffab1e LSL #32
            mov     w1, #7
            mov     v9.d[0], v8.d[1]
            bl      CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
            movz    x0, #0xd1ffab1e
            movk    x0, #0xd1ffab1e LSL #16
            movk    x0, #0xd1ffab1e LSL #32
            ldr     x1, [x0]

@tannergooding should double check before we merge it.

private static readonly Vector128<byte> s_upperShuffleMask_CopyToBoolArray = Vector128.Create(0x02020202_02020202, 0x03030303_03030303).AsByte();
private static readonly Vector128<byte> s_lowerShuffleMask_CopyToBoolArray =
BitConverter.IsLittleEndian ? Vector128.Create(0, 0x01010101_01010101).AsByte() : Vector128.Create(0x03030303_03030303, 0x02020202_02020202).AsByte();
private static readonly Vector128<byte> s_upperShuffleMask_CopyToBoolArray =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the work done in #36267 and #35857, is it still beneficial for these to be cached in static readonly fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that we don't need these anymore - if I remember correctly those PRs make the JIT recognise those Create calls and store the data in memory during codegen, so the code can load the data from memory - so I expect it to result in about the same, or perhaps better code. It's been a while since I've rebased the branch so I could make the change while I'm rebasing and see how it turns out.

@eiriktsarpalis eiriktsarpalis added this to the 6.0.0 milestone Sep 30, 2020
@eiriktsarpalis
Copy link
Member

This PR has been open for a while, is there any remaining work needed to be done to get it merged?

@Gnbrkm41
Copy link
Contributor Author

Gnbrkm41 commented Oct 1, 2020

I think we want to verify that this doesn't introduce any regressions and as @stephentoub mentioned we could perhaps also clean the code up a bit, following the recent change in the JIT that allows us to get rid of static readonly variables in favour of locally declared variables (which we expect it to be turned into constants stored in memory).

I'll rebase the branch first and see how the disasm / benchmark turns out, as I've been seeing some oddities around the disassembled code (where we were seeing calls to Debug.Fail when it shouldn't be there as it's compiled in release mode).

* This commit addresses some of the comments raised during the initial PR of the vectorised BitArray code, such as using new API/intrinsics. Also highlights a couple potential improvements.
@Gnbrkm41
Copy link
Contributor Author

Unfortunately, I don't expect to be able to work on the PR for some time. If any of you folks have concerns with this being open for several months, please do feel free to close the PR - I could eventually work on this again when I'm able to.

@kunalspathak
Copy link
Member

Thank you @Gnbrkm41 for confirming. Feel free to reopen when you get time.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants