Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: gathervector #103159

Merged
merged 10 commits into from
Jun 12, 2024
Merged

ARM64-SVE: gathervector #103159

merged 10 commits into from
Jun 12, 2024

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Jun 7, 2024

No description provided.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 7, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 7, 2024

This is working.

However, 3 tests are commented out. These are the ones with Vector<uint> for the address. Eg:

        public static unsafe Vector<float> GatherVector(Vector<float> mask, Vector<uint> addresses);

I don't think this works for C#.

Each address is a 32bit value. Sve is only for Arm64.

@tannergooding : thoughts?

@a74nh a74nh marked this pull request as ready for review June 7, 2024 11:14
@a74nh
Copy link
Contributor Author

a74nh commented Jun 7, 2024

@dotnet/arm64-contrib @kunalspathak

@a74nh
Copy link
Contributor Author

a74nh commented Jun 7, 2024

Stress tests results

Looks like two errors:

  1. Incorrect values loaded, probably due to incorrect predicates (the known issue?)
    2) Segfault due to invalid memory load, always in GatherVector_Indices_float_int. Investigating this.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 7, 2024

Stress tests results

Looks like two errors:

  1. Incorrect values loaded, probably due to incorrect predicates (the known issue?)
  2. Segfault due to invalid memory load, always in GatherVector_Indices_float_int. Investigating this.

Scratch error 2 - those were due to running on a 256bit machine (and therefore issues around storing to the stack).
So just error 1, which looks like the known predicate issue.

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 7, 2024
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall. added some comments.

src/coreclr/jit/hwintrinsiclistarm64sve.h Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsiccodegenarm64.cpp Outdated Show resolved Hide resolved

if (auxSize == EA_8BYTE)
{
opt = varTypeIsUnsigned(auxType) ? INS_OPTS_SCALABLE_D_UXTW : INS_OPTS_SCALABLE_D_SXTW;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the summary docs, we are emitting LD1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3], which does not take xs field, so we do not have to embed UXTW or SXTW bit, right?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the summary docs, we are emitting LD1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3],

Correct

which does not take xs field, so we do not have to embed UXTW or SXTW bit, right?

Removing this, I get an asset (in emitarm64sve.cpp:6441). Looks like emit is expecting this field. I'll have to debug it a little more. There are a lot of different loads and I don't want to break any of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you are saying that although we pass the *XTW flag, we are not using it in for this instruction, although the code path flows through the emitarm64sve.cpp that you mentioned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's using the flag to make a decision to decide on the correct encoding. Then doesn't use it later. Probably just needs a fix in emit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just needs a fix in emit.

Did you fix it in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing modes now fixed. There are three variants for "address with vector of offsets":

Vector<64bit> GatherVector(Vector<64bit> mask, 64bit* address, Vector<64bit> indices)
-> LD1D Zresult.D, Pg/Z, [Xbase, Zindices.D, LSL #3]
Vector<32bit> GatherVector(Vector<32bit> mask, 32bit* address, Vector<signed32bit> indices)
-> LD1W Zresult.S, Pg/Z, [Xbase, Zindices.S, SXTW #2]
Vector<32bit> GatherVector(Vector<32bit> mask, 32bit* address, Vector<unsigned32bit> indices)
-> LD1W Zresult.S, Pg/Z, [Xbase, Zindices.S, UXTW #2]

Codegen now picks the correct ones

@tannergooding
Copy link
Member

@tannergooding : thoughts?

If there is an encoding that allows this parameter to be Vector<uint> then we should expose the corresponding managed API.

Having trouble following the SVE docs here since there are like 8+ different instructions called LD1D, some of which are for contiguous and some of which are for gather....

But it looks like LD1D (scalar plus vector) only allows the indexes to be 32-bit

Gather load of doublewords to active elements of a vector register from memory addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then optionally multiplied by 8. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.

In all cases, it looks like esize is 64 and msize is 64. It's only offs_size that is 32 or 64

@a74nh
Copy link
Contributor Author

a74nh commented Jun 10, 2024

@tannergooding : thoughts?

If there is an encoding that allows this parameter to be Vector<uint> then we should expose the corresponding managed API.

Having trouble following the SVE docs here since there are like 8+ different instructions called LD1D, some of which are for contiguous and some of which are for gather....

But it looks like LD1D (scalar plus vector) only allows the indexes to be 32-bit

Gather load of doublewords to active elements of a vector register from memory addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then optionally multiplied by 8. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.

In all cases, it looks like esize is 64 and msize is 64. It's only offs_size that is 32 or 64

This is for:

        /// <summary>
        /// svint32_t svld1_gather[_u32base]_s32(svbool_t pg, svuint32_t bases)
        ///   LD1W Zresult.S, Pg/Z, [Zbases.S, #0]
        /// </summary>
        public static unsafe Vector<int> GatherVector(Vector<int> mask, Vector<uint> addresses) => GatherVector(mask, addresses);

Which maps to Gather load unsigned words to vector (immediate index) with an immediate value of 0.
The addresses are 32bit (with esize = 32) and then zero extended to 64bits.

Unless we can the address of a C# object to fit into a 32bit address I can't see this working.

@tannergooding
Copy link
Member

Unless we can the address of a C# object to fit into a 32bit address I can't see this working.

I don't see the issue. This doesn't have anything to do with objects, but rather truncated pointers. That is, Vector<T> addresses represents the pointers into fixed memory where that address is in the first 4GB of the address space.

While it isn't usable for an arbitrary T*, it can still be used for reading from explicitly mapped memory such as you might have in more advanced user scenarios. -- The same scenarios you'd use it for in C/C++

@build-analysis build-analysis bot mentioned this pull request Jun 10, 2024
@a74nh
Copy link
Contributor Author

a74nh commented Jun 10, 2024

Unless we can the address of a C# object to fit into a 32bit address I can't see this working.

I don't see the issue. This doesn't have anything to do with objects, but rather truncated pointers. That is, Vector<T> addresses represents the pointers into fixed memory where that address is in the first 4GB of the address space.

While it isn't usable for an arbitrary T*, it can still be used for reading from explicitly mapped memory such as you might have in more advanced user scenarios. -- The same scenarios you'd use it for in C/C++

Ok, that's fair, I wasn't sure it was possible to restrict memory in that way in C#.

Do you have any pointers on how I might construct a blob of memory the first 4GB in C# for the test case?

@tannergooding
Copy link
Member

You'd need to use an API like VirtualAlloc on Windows, such as via P/Invoke: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc. An equivalent, such as mmap, would need to be used on Unix.

It should be roughly equivalent to how you'd test the same functionality in C/C++

@kunalspathak
Copy link
Member

You'd need to use an API like VirtualAlloc on Windows, such as via P/Invoke: https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc. An equivalent, such as mmap, would need to be used on Unix.

It should be roughly equivalent to how you'd test the same functionality in C/C++

I don't want to spend too much time on writing test for this scenario, given that we have around 40% APIs to complete in .NET 9. For now, I will just comment out exposing these APIs and come back to it once we have some cycles.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 11, 2024

I don't want to spend too much time on writing test for this scenario, given that we have around 40% APIs to complete in .NET 9. For now, I will just comment out exposing these APIs and come back to it once we have some cycles.

Agreed - gather vector is likely to be infrequently used. Use of variants with a 32bit address will be much rarer.

I've removed them from the external API files, but left them commented out in internal files. That should make it easier to revive later.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 11, 2024

Stress results.
Looks like all the failures are known predicate issues during Stress 2 + Stress Regs 1.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after we put back the 32-bit address APIs with a comment pointing to #103297

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak
Copy link
Member

/ba-g failures are because of #103354

@kunalspathak kunalspathak merged commit 3e8b786 into dotnet:main Jun 12, 2024
160 of 167 checks passed
@a74nh a74nh deleted the gatherload_github branch June 13, 2024 10:40
@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants