Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: ARM64 SVE/SVE2 : Unpredicated instructions (part 1 of 4) #93459

Closed
a74nh opened this issue Oct 13, 2023 · 13 comments
Closed

[API Proposal]: ARM64 SVE/SVE2 : Unpredicated instructions (part 1 of 4) #93459

a74nh opened this issue Oct 13, 2023 · 13 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.Intrinsics

Comments

@a74nh
Copy link
Contributor

a74nh commented Oct 13, 2023

Background and motivation

413 methods (of 1229)
Part 2: #93460
Part 3: #93461
Part 4: #93462

This API covers all the SVE/SVE2 instructions that don't require predication. Split into multiple parts to fit within github issue max size.

This list only includes SVE instructions currently available in existing hardware, ie: FEAT_SVE and FEAT_SVE2. No other SVE extensions are covered.
List of SVE instructions

This list was auto generated from the C ACLE intrincs for SVE.
SVE intrinsics doc
Interactive list of SVE intrincs
In the following way:

  • Generate complete list of methods. All methods names are created from camelcasing the description.
  • Remove all methods that use float16 or bfloat16. C# does not yet support 16bit float.
  • Remove all methods that use QuadWord. This is FEAT_F64MM.
  • Remove all methods that use predicates. These will be covered in a future API suggestion.
  • Remove all methods without an underlying SVE instruction. These will be covered in a future API suggestion.

For each method I've included the original C instrinsic (eg svaba[_s8]) and the SVE instruction generated (eg SABA). For many methods 2 instructions may be generated due to register usage (eg MOVPRFX+SABA when the result is in a different register from the inputs).

Contributes towards #93095

API Proposal

namespace System.Runtime.Intrinsics.Arm

public class ArmSVE<T>
{

  public static unsafe Vector<sbyte> AbsoluteDifferenceAndAccumulate(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svaba[_s8]: SABA or MOVPRFX+SABA
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulate(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svaba[_s16]: SABA or MOVPRFX+SABA
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulate(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svaba[_s32]: SABA or MOVPRFX+SABA
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulate(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svaba[_s64]: SABA or MOVPRFX+SABA
  public static unsafe Vector<byte> AbsoluteDifferenceAndAccumulate(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svaba[_u8]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulate(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svaba[_u16]: UABA or MOVPRFX+UABA
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulate(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svaba[_u32]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulate(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svaba[_u64]: UABA or MOVPRFX+UABA
  public static unsafe Vector<sbyte> AbsoluteDifferenceAndAccumulate(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svaba[_n_s8]: SABA or MOVPRFX+SABA
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulate(Vector<short> op1, Vector<short> op2, short op3);  // svaba[_n_s16]: SABA or MOVPRFX+SABA
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulate(Vector<int> op1, Vector<int> op2, int op3);  // svaba[_n_s32]: SABA or MOVPRFX+SABA
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulate(Vector<long> op1, Vector<long> op2, long op3);  // svaba[_n_s64]: SABA or MOVPRFX+SABA
  public static unsafe Vector<byte> AbsoluteDifferenceAndAccumulate(Vector<byte> op1, Vector<byte> op2, byte op3);  // svaba[_n_u8]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulate(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svaba[_n_u16]: UABA or MOVPRFX+UABA
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulate(Vector<uint> op1, Vector<uint> op2, uint op3);  // svaba[_n_u32]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulate(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svaba[_n_u64]: UABA or MOVPRFX+UABA

  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongBottom(Vector<short> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svabalb[_s16]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongBottom(Vector<int> op1, Vector<short> op2, Vector<short> op3);  // svabalb[_s32]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongBottom(Vector<long> op1, Vector<int> op2, Vector<int> op3);  // svabalb[_s64]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ushort> op1, Vector<byte> op2, Vector<byte> op3);  // svabalb[_u16]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongBottom(Vector<uint> op1, Vector<ushort> op2, Vector<ushort> op3);  // svabalb[_u32]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ulong> op1, Vector<uint> op2, Vector<uint> op3);  // svabalb[_u64]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongBottom(Vector<short> op1, Vector<sbyte> op2, sbyte op3);  // svabalb[_n_s16]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongBottom(Vector<int> op1, Vector<short> op2, short op3);  // svabalb[_n_s32]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongBottom(Vector<long> op1, Vector<int> op2, int op3);  // svabalb[_n_s64]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ushort> op1, Vector<byte> op2, byte op3);  // svabalb[_n_u16]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongBottom(Vector<uint> op1, Vector<ushort> op2, ushort op3);  // svabalb[_n_u32]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ulong> op1, Vector<uint> op2, uint op3);  // svabalb[_n_u64]: UABALB or MOVPRFX+UABALB

  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongTop(Vector<short> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svabalt[_s16]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongTop(Vector<int> op1, Vector<short> op2, Vector<short> op3);  // svabalt[_s32]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongTop(Vector<long> op1, Vector<int> op2, Vector<int> op3);  // svabalt[_s64]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongTop(Vector<ushort> op1, Vector<byte> op2, Vector<byte> op3);  // svabalt[_u16]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongTop(Vector<uint> op1, Vector<ushort> op2, Vector<ushort> op3);  // svabalt[_u32]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongTop(Vector<ulong> op1, Vector<uint> op2, Vector<uint> op3);  // svabalt[_u64]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongTop(Vector<short> op1, Vector<sbyte> op2, sbyte op3);  // svabalt[_n_s16]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongTop(Vector<int> op1, Vector<short> op2, short op3);  // svabalt[_n_s32]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongTop(Vector<long> op1, Vector<int> op2, int op3);  // svabalt[_n_s64]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongTop(Vector<ushort> op1, Vector<byte> op2, byte op3);  // svabalt[_n_u16]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongTop(Vector<uint> op1, Vector<ushort> op2, ushort op3);  // svabalt[_n_u32]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongTop(Vector<ulong> op1, Vector<uint> op2, uint op3);  // svabalt[_n_u64]: UABALT or MOVPRFX+UABALT

  public static unsafe Vector<short> AbsoluteDifferenceLongBottom(Vector<sbyte> op1, Vector<sbyte> op2);  // svabdlb[_s16]: SABDLB
  public static unsafe Vector<int> AbsoluteDifferenceLongBottom(Vector<short> op1, Vector<short> op2);  // svabdlb[_s32]: SABDLB
  public static unsafe Vector<long> AbsoluteDifferenceLongBottom(Vector<int> op1, Vector<int> op2);  // svabdlb[_s64]: SABDLB
  public static unsafe Vector<ushort> AbsoluteDifferenceLongBottom(Vector<byte> op1, Vector<byte> op2);  // svabdlb[_u16]: UABDLB
  public static unsafe Vector<uint> AbsoluteDifferenceLongBottom(Vector<ushort> op1, Vector<ushort> op2);  // svabdlb[_u32]: UABDLB
  public static unsafe Vector<ulong> AbsoluteDifferenceLongBottom(Vector<uint> op1, Vector<uint> op2);  // svabdlb[_u64]: UABDLB
  public static unsafe Vector<short> AbsoluteDifferenceLongBottom(Vector<sbyte> op1, sbyte op2);  // svabdlb[_n_s16]: SABDLB
  public static unsafe Vector<int> AbsoluteDifferenceLongBottom(Vector<short> op1, short op2);  // svabdlb[_n_s32]: SABDLB
  public static unsafe Vector<long> AbsoluteDifferenceLongBottom(Vector<int> op1, int op2);  // svabdlb[_n_s64]: SABDLB
  public static unsafe Vector<ushort> AbsoluteDifferenceLongBottom(Vector<byte> op1, byte op2);  // svabdlb[_n_u16]: UABDLB
  public static unsafe Vector<uint> AbsoluteDifferenceLongBottom(Vector<ushort> op1, ushort op2);  // svabdlb[_n_u32]: UABDLB
  public static unsafe Vector<ulong> AbsoluteDifferenceLongBottom(Vector<uint> op1, uint op2);  // svabdlb[_n_u64]: UABDLB

  public static unsafe Vector<short> AbsoluteDifferenceLongTop(Vector<sbyte> op1, Vector<sbyte> op2);  // svabdlt[_s16]: SABDLT
  public static unsafe Vector<int> AbsoluteDifferenceLongTop(Vector<short> op1, Vector<short> op2);  // svabdlt[_s32]: SABDLT
  public static unsafe Vector<long> AbsoluteDifferenceLongTop(Vector<int> op1, Vector<int> op2);  // svabdlt[_s64]: SABDLT
  public static unsafe Vector<ushort> AbsoluteDifferenceLongTop(Vector<byte> op1, Vector<byte> op2);  // svabdlt[_u16]: UABDLT
  public static unsafe Vector<uint> AbsoluteDifferenceLongTop(Vector<ushort> op1, Vector<ushort> op2);  // svabdlt[_u32]: UABDLT
  public static unsafe Vector<ulong> AbsoluteDifferenceLongTop(Vector<uint> op1, Vector<uint> op2);  // svabdlt[_u64]: UABDLT
  public static unsafe Vector<short> AbsoluteDifferenceLongTop(Vector<sbyte> op1, sbyte op2);  // svabdlt[_n_s16]: SABDLT
  public static unsafe Vector<int> AbsoluteDifferenceLongTop(Vector<short> op1, short op2);  // svabdlt[_n_s32]: SABDLT
  public static unsafe Vector<long> AbsoluteDifferenceLongTop(Vector<int> op1, int op2);  // svabdlt[_n_s64]: SABDLT
  public static unsafe Vector<ushort> AbsoluteDifferenceLongTop(Vector<byte> op1, byte op2);  // svabdlt[_n_u16]: UABDLT
  public static unsafe Vector<uint> AbsoluteDifferenceLongTop(Vector<ushort> op1, ushort op2);  // svabdlt[_n_u32]: UABDLT
  public static unsafe Vector<ulong> AbsoluteDifferenceLongTop(Vector<uint> op1, uint op2);  // svabdlt[_n_u64]: UABDLT

  public static unsafe Vector<uint> AddWithCarryLongBottom(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svadclb[_u32]: ADCLB or MOVPRFX+ADCLB
  public static unsafe Vector<ulong> AddWithCarryLongBottom(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svadclb[_u64]: ADCLB or MOVPRFX+ADCLB
  public static unsafe Vector<uint> AddWithCarryLongBottom(Vector<uint> op1, Vector<uint> op2, uint op3);  // svadclb[_n_u32]: ADCLB or MOVPRFX+ADCLB
  public static unsafe Vector<ulong> AddWithCarryLongBottom(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svadclb[_n_u64]: ADCLB or MOVPRFX+ADCLB

  public static unsafe Vector<uint> AddWithCarryLongTop(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svadclt[_u32]: ADCLT or MOVPRFX+ADCLT
  public static unsafe Vector<ulong> AddWithCarryLongTop(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svadclt[_u64]: ADCLT or MOVPRFX+ADCLT
  public static unsafe Vector<uint> AddWithCarryLongTop(Vector<uint> op1, Vector<uint> op2, uint op3);  // svadclt[_n_u32]: ADCLT or MOVPRFX+ADCLT
  public static unsafe Vector<ulong> AddWithCarryLongTop(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svadclt[_n_u64]: ADCLT or MOVPRFX+ADCLT

  public static unsafe Vector<sbyte> AddNarrowHighPartBottom(Vector<short> op1, Vector<short> op2);  // svaddhnb[_s16]: ADDHNB
  public static unsafe Vector<short> AddNarrowHighPartBottom(Vector<int> op1, Vector<int> op2);  // svaddhnb[_s32]: ADDHNB
  public static unsafe Vector<int> AddNarrowHighPartBottom(Vector<long> op1, Vector<long> op2);  // svaddhnb[_s64]: ADDHNB
  public static unsafe Vector<byte> AddNarrowHighPartBottom(Vector<ushort> op1, Vector<ushort> op2);  // svaddhnb[_u16]: ADDHNB
  public static unsafe Vector<ushort> AddNarrowHighPartBottom(Vector<uint> op1, Vector<uint> op2);  // svaddhnb[_u32]: ADDHNB
  public static unsafe Vector<uint> AddNarrowHighPartBottom(Vector<ulong> op1, Vector<ulong> op2);  // svaddhnb[_u64]: ADDHNB
  public static unsafe Vector<sbyte> AddNarrowHighPartBottom(Vector<short> op1, short op2);  // svaddhnb[_n_s16]: ADDHNB
  public static unsafe Vector<short> AddNarrowHighPartBottom(Vector<int> op1, int op2);  // svaddhnb[_n_s32]: ADDHNB
  public static unsafe Vector<int> AddNarrowHighPartBottom(Vector<long> op1, long op2);  // svaddhnb[_n_s64]: ADDHNB
  public static unsafe Vector<byte> AddNarrowHighPartBottom(Vector<ushort> op1, ushort op2);  // svaddhnb[_n_u16]: ADDHNB
  public static unsafe Vector<ushort> AddNarrowHighPartBottom(Vector<uint> op1, uint op2);  // svaddhnb[_n_u32]: ADDHNB
  public static unsafe Vector<uint> AddNarrowHighPartBottom(Vector<ulong> op1, ulong op2);  // svaddhnb[_n_u64]: ADDHNB

  public static unsafe Vector<sbyte> AddNarrowHighPartTop(Vector<sbyte> even, Vector<short> op1, Vector<short> op2);  // svaddhnt[_s16]: ADDHNT
  public static unsafe Vector<short> AddNarrowHighPartTop(Vector<short> even, Vector<int> op1, Vector<int> op2);  // svaddhnt[_s32]: ADDHNT
  public static unsafe Vector<int> AddNarrowHighPartTop(Vector<int> even, Vector<long> op1, Vector<long> op2);  // svaddhnt[_s64]: ADDHNT
  public static unsafe Vector<byte> AddNarrowHighPartTop(Vector<byte> even, Vector<ushort> op1, Vector<ushort> op2);  // svaddhnt[_u16]: ADDHNT
  public static unsafe Vector<ushort> AddNarrowHighPartTop(Vector<ushort> even, Vector<uint> op1, Vector<uint> op2);  // svaddhnt[_u32]: ADDHNT
  public static unsafe Vector<uint> AddNarrowHighPartTop(Vector<uint> even, Vector<ulong> op1, Vector<ulong> op2);  // svaddhnt[_u64]: ADDHNT
  public static unsafe Vector<sbyte> AddNarrowHighPartTop(Vector<sbyte> even, Vector<short> op1, short op2);  // svaddhnt[_n_s16]: ADDHNT
  public static unsafe Vector<short> AddNarrowHighPartTop(Vector<short> even, Vector<int> op1, int op2);  // svaddhnt[_n_s32]: ADDHNT
  public static unsafe Vector<int> AddNarrowHighPartTop(Vector<int> even, Vector<long> op1, long op2);  // svaddhnt[_n_s64]: ADDHNT
  public static unsafe Vector<byte> AddNarrowHighPartTop(Vector<byte> even, Vector<ushort> op1, ushort op2);  // svaddhnt[_n_u16]: ADDHNT
  public static unsafe Vector<ushort> AddNarrowHighPartTop(Vector<ushort> even, Vector<uint> op1, uint op2);  // svaddhnt[_n_u32]: ADDHNT
  public static unsafe Vector<uint> AddNarrowHighPartTop(Vector<uint> even, Vector<ulong> op1, ulong op2);  // svaddhnt[_n_u64]: ADDHNT

  public static unsafe Vector<short> AddLongBottom(Vector<sbyte> op1, Vector<sbyte> op2);  // svaddlb[_s16]: SADDLB
  public static unsafe Vector<int> AddLongBottom(Vector<short> op1, Vector<short> op2);  // svaddlb[_s32]: SADDLB
  public static unsafe Vector<long> AddLongBottom(Vector<int> op1, Vector<int> op2);  // svaddlb[_s64]: SADDLB
  public static unsafe Vector<ushort> AddLongBottom(Vector<byte> op1, Vector<byte> op2);  // svaddlb[_u16]: UADDLB
  public static unsafe Vector<uint> AddLongBottom(Vector<ushort> op1, Vector<ushort> op2);  // svaddlb[_u32]: UADDLB
  public static unsafe Vector<ulong> AddLongBottom(Vector<uint> op1, Vector<uint> op2);  // svaddlb[_u64]: UADDLB
  public static unsafe Vector<short> AddLongBottom(Vector<sbyte> op1, sbyte op2);  // svaddlb[_n_s16]: SADDLB
  public static unsafe Vector<int> AddLongBottom(Vector<short> op1, short op2);  // svaddlb[_n_s32]: SADDLB
  public static unsafe Vector<long> AddLongBottom(Vector<int> op1, int op2);  // svaddlb[_n_s64]: SADDLB
  public static unsafe Vector<ushort> AddLongBottom(Vector<byte> op1, byte op2);  // svaddlb[_n_u16]: UADDLB
  public static unsafe Vector<uint> AddLongBottom(Vector<ushort> op1, ushort op2);  // svaddlb[_n_u32]: UADDLB
  public static unsafe Vector<ulong> AddLongBottom(Vector<uint> op1, uint op2);  // svaddlb[_n_u64]: UADDLB

  public static unsafe Vector<short> AddLongBottomTop(Vector<sbyte> op1, Vector<sbyte> op2);  // svaddlbt[_s16]: SADDLBT
  public static unsafe Vector<int> AddLongBottomTop(Vector<short> op1, Vector<short> op2);  // svaddlbt[_s32]: SADDLBT
  public static unsafe Vector<long> AddLongBottomTop(Vector<int> op1, Vector<int> op2);  // svaddlbt[_s64]: SADDLBT
  public static unsafe Vector<short> AddLongBottomTop(Vector<sbyte> op1, sbyte op2);  // svaddlbt[_n_s16]: SADDLBT
  public static unsafe Vector<int> AddLongBottomTop(Vector<short> op1, short op2);  // svaddlbt[_n_s32]: SADDLBT
  public static unsafe Vector<long> AddLongBottomTop(Vector<int> op1, int op2);  // svaddlbt[_n_s64]: SADDLBT

  public static unsafe Vector<short> AddLongTop(Vector<sbyte> op1, Vector<sbyte> op2);  // svaddlt[_s16]: SADDLT
  public static unsafe Vector<int> AddLongTop(Vector<short> op1, Vector<short> op2);  // svaddlt[_s32]: SADDLT
  public static unsafe Vector<long> AddLongTop(Vector<int> op1, Vector<int> op2);  // svaddlt[_s64]: SADDLT
  public static unsafe Vector<ushort> AddLongTop(Vector<byte> op1, Vector<byte> op2);  // svaddlt[_u16]: UADDLT
  public static unsafe Vector<uint> AddLongTop(Vector<ushort> op1, Vector<ushort> op2);  // svaddlt[_u32]: UADDLT
  public static unsafe Vector<ulong> AddLongTop(Vector<uint> op1, Vector<uint> op2);  // svaddlt[_u64]: UADDLT
  public static unsafe Vector<short> AddLongTop(Vector<sbyte> op1, sbyte op2);  // svaddlt[_n_s16]: SADDLT
  public static unsafe Vector<int> AddLongTop(Vector<short> op1, short op2);  // svaddlt[_n_s32]: SADDLT
  public static unsafe Vector<long> AddLongTop(Vector<int> op1, int op2);  // svaddlt[_n_s64]: SADDLT
  public static unsafe Vector<ushort> AddLongTop(Vector<byte> op1, byte op2);  // svaddlt[_n_u16]: UADDLT
  public static unsafe Vector<uint> AddLongTop(Vector<ushort> op1, ushort op2);  // svaddlt[_n_u32]: UADDLT
  public static unsafe Vector<ulong> AddLongTop(Vector<uint> op1, uint op2);  // svaddlt[_n_u64]: UADDLT

  public static unsafe Vector<short> AddWideBottom(Vector<short> op1, Vector<sbyte> op2);  // svaddwb[_s16]: SADDWB
  public static unsafe Vector<int> AddWideBottom(Vector<int> op1, Vector<short> op2);  // svaddwb[_s32]: SADDWB
  public static unsafe Vector<long> AddWideBottom(Vector<long> op1, Vector<int> op2);  // svaddwb[_s64]: SADDWB
  public static unsafe Vector<ushort> AddWideBottom(Vector<ushort> op1, Vector<byte> op2);  // svaddwb[_u16]: UADDWB
  public static unsafe Vector<uint> AddWideBottom(Vector<uint> op1, Vector<ushort> op2);  // svaddwb[_u32]: UADDWB
  public static unsafe Vector<ulong> AddWideBottom(Vector<ulong> op1, Vector<uint> op2);  // svaddwb[_u64]: UADDWB
  public static unsafe Vector<short> AddWideBottom(Vector<short> op1, sbyte op2);  // svaddwb[_n_s16]: SADDWB
  public static unsafe Vector<int> AddWideBottom(Vector<int> op1, short op2);  // svaddwb[_n_s32]: SADDWB
  public static unsafe Vector<long> AddWideBottom(Vector<long> op1, int op2);  // svaddwb[_n_s64]: SADDWB
  public static unsafe Vector<ushort> AddWideBottom(Vector<ushort> op1, byte op2);  // svaddwb[_n_u16]: UADDWB
  public static unsafe Vector<uint> AddWideBottom(Vector<uint> op1, ushort op2);  // svaddwb[_n_u32]: UADDWB
  public static unsafe Vector<ulong> AddWideBottom(Vector<ulong> op1, uint op2);  // svaddwb[_n_u64]: UADDWB

  public static unsafe Vector<short> AddWideTop(Vector<short> op1, Vector<sbyte> op2);  // svaddwt[_s16]: SADDWT
  public static unsafe Vector<int> AddWideTop(Vector<int> op1, Vector<short> op2);  // svaddwt[_s32]: SADDWT
  public static unsafe Vector<long> AddWideTop(Vector<long> op1, Vector<int> op2);  // svaddwt[_s64]: SADDWT
  public static unsafe Vector<ushort> AddWideTop(Vector<ushort> op1, Vector<byte> op2);  // svaddwt[_u16]: UADDWT
  public static unsafe Vector<uint> AddWideTop(Vector<uint> op1, Vector<ushort> op2);  // svaddwt[_u32]: UADDWT
  public static unsafe Vector<ulong> AddWideTop(Vector<ulong> op1, Vector<uint> op2);  // svaddwt[_u64]: UADDWT
  public static unsafe Vector<short> AddWideTop(Vector<short> op1, sbyte op2);  // svaddwt[_n_s16]: SADDWT
  public static unsafe Vector<int> AddWideTop(Vector<int> op1, short op2);  // svaddwt[_n_s32]: SADDWT
  public static unsafe Vector<long> AddWideTop(Vector<long> op1, int op2);  // svaddwt[_n_s64]: SADDWT
  public static unsafe Vector<ushort> AddWideTop(Vector<ushort> op1, byte op2);  // svaddwt[_n_u16]: UADDWT
  public static unsafe Vector<uint> AddWideTop(Vector<uint> op1, ushort op2);  // svaddwt[_n_u32]: UADDWT
  public static unsafe Vector<ulong> AddWideTop(Vector<ulong> op1, uint op2);  // svaddwt[_n_u64]: UADDWT

  public static unsafe Vector<uint> ComputeVectorAddressesFor8BitData(Vector<uint> bases, Vector<int> offsets);  // svadrb[_u32base]_[s32]offset: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor8BitData(Vector<uint> bases, Vector<uint> offsets);  // svadrb[_u32base]_[u32]offset: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor8BitData(Vector<ulong> bases, Vector<long> offsets);  // svadrb[_u64base]_[s64]offset: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor8BitData(Vector<ulong> bases, Vector<ulong> offsets);  // svadrb[_u64base]_[u64]offset: ADR

  public static unsafe Vector<uint> ComputeVectorAddressesFor64BitData(Vector<uint> bases, Vector<int> indices);  // svadrd[_u32base]_[s32]index: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor64BitData(Vector<uint> bases, Vector<uint> indices);  // svadrd[_u32base]_[u32]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor64BitData(Vector<ulong> bases, Vector<long> indices);  // svadrd[_u64base]_[s64]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor64BitData(Vector<ulong> bases, Vector<ulong> indices);  // svadrd[_u64base]_[u64]index: ADR

  public static unsafe Vector<uint> ComputeVectorAddressesFor16BitData(Vector<uint> bases, Vector<int> indices);  // svadrh[_u32base]_[s32]index: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor16BitData(Vector<uint> bases, Vector<uint> indices);  // svadrh[_u32base]_[u32]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor16BitData(Vector<ulong> bases, Vector<long> indices);  // svadrh[_u64base]_[s64]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor16BitData(Vector<ulong> bases, Vector<ulong> indices);  // svadrh[_u64base]_[u64]index: ADR

  public static unsafe Vector<uint> ComputeVectorAddressesFor32BitData(Vector<uint> bases, Vector<int> indices);  // svadrw[_u32base]_[s32]index: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor32BitData(Vector<uint> bases, Vector<uint> indices);  // svadrw[_u32base]_[u32]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor32BitData(Vector<ulong> bases, Vector<long> indices);  // svadrw[_u64base]_[s64]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor32BitData(Vector<ulong> bases, Vector<ulong> indices);  // svadrw[_u64base]_[u64]index: ADR

  public static unsafe Vector<byte> AesSingleRoundDecryption(Vector<byte> op1, Vector<byte> op2);  // svaesd[_u8]: AESD or AESD

  public static unsafe Vector<byte> AesSingleRoundEncryption(Vector<byte> op1, Vector<byte> op2);  // svaese[_u8]: AESE or AESE

  public static unsafe Vector<byte> AesInverseMixColumns(Vector<byte> op);  // svaesimc[_u8]: AESIMC

  public static unsafe Vector<byte> AesMixColumns(Vector<byte> op);  // svaesmc[_u8]: AESMC

  public static unsafe Vector<sbyte> BitwiseClearAndExclusiveOr(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbcax[_s8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<short> BitwiseClearAndExclusiveOr(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbcax[_s16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<int> BitwiseClearAndExclusiveOr(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbcax[_s32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<long> BitwiseClearAndExclusiveOr(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbcax[_s64]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<byte> BitwiseClearAndExclusiveOr(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbcax[_u8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ushort> BitwiseClearAndExclusiveOr(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbcax[_u16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<uint> BitwiseClearAndExclusiveOr(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbcax[_u32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ulong> BitwiseClearAndExclusiveOr(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbcax[_u64]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<sbyte> BitwiseClearAndExclusiveOr(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbcax[_n_s8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<short> BitwiseClearAndExclusiveOr(Vector<short> op1, Vector<short> op2, short op3);  // svbcax[_n_s16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<int> BitwiseClearAndExclusiveOr(Vector<int> op1, Vector<int> op2, int op3);  // svbcax[_n_s32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<long> BitwiseClearAndExclusiveOr(Vector<long> op1, Vector<long> op2, long op3);  // svbcax[_n_s64]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<byte> BitwiseClearAndExclusiveOr(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbcax[_n_u8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ushort> BitwiseClearAndExclusiveOr(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbcax[_n_u16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<uint> BitwiseClearAndExclusiveOr(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbcax[_n_u32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ulong> BitwiseClearAndExclusiveOr(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbcax[_n_u64]: BCAX or MOVPRFX+BCAX

  public static unsafe Vector<byte> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<byte> op1, Vector<byte> op2);  // svbdep[_u8]: BDEP
  public static unsafe Vector<ushort> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ushort> op1, Vector<ushort> op2);  // svbdep[_u16]: BDEP
  public static unsafe Vector<uint> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<uint> op1, Vector<uint> op2);  // svbdep[_u32]: BDEP
  public static unsafe Vector<ulong> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ulong> op1, Vector<ulong> op2);  // svbdep[_u64]: BDEP
  public static unsafe Vector<byte> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<byte> op1, byte op2);  // svbdep[_n_u8]: BDEP
  public static unsafe Vector<ushort> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ushort> op1, ushort op2);  // svbdep[_n_u16]: BDEP
  public static unsafe Vector<uint> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<uint> op1, uint op2);  // svbdep[_n_u32]: BDEP
  public static unsafe Vector<ulong> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ulong> op1, ulong op2);  // svbdep[_n_u64]: BDEP

  public static unsafe Vector<byte> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<byte> op1, Vector<byte> op2);  // svbext[_u8]: BEXT
  public static unsafe Vector<ushort> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ushort> op1, Vector<ushort> op2);  // svbext[_u16]: BEXT
  public static unsafe Vector<uint> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<uint> op1, Vector<uint> op2);  // svbext[_u32]: BEXT
  public static unsafe Vector<ulong> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ulong> op1, Vector<ulong> op2);  // svbext[_u64]: BEXT
  public static unsafe Vector<byte> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<byte> op1, byte op2);  // svbext[_n_u8]: BEXT
  public static unsafe Vector<ushort> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ushort> op1, ushort op2);  // svbext[_n_u16]: BEXT
  public static unsafe Vector<uint> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<uint> op1, uint op2);  // svbext[_n_u32]: BEXT
  public static unsafe Vector<ulong> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ulong> op1, ulong op2);  // svbext[_n_u64]: BEXT

  public static unsafe Vector<byte> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<byte> op1, Vector<byte> op2);  // svbgrp[_u8]: BGRP
  public static unsafe Vector<ushort> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ushort> op1, Vector<ushort> op2);  // svbgrp[_u16]: BGRP
  public static unsafe Vector<uint> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<uint> op1, Vector<uint> op2);  // svbgrp[_u32]: BGRP
  public static unsafe Vector<ulong> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ulong> op1, Vector<ulong> op2);  // svbgrp[_u64]: BGRP
  public static unsafe Vector<byte> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<byte> op1, byte op2);  // svbgrp[_n_u8]: BGRP
  public static unsafe Vector<ushort> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ushort> op1, ushort op2);  // svbgrp[_n_u16]: BGRP
  public static unsafe Vector<uint> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<uint> op1, uint op2);  // svbgrp[_n_u32]: BGRP
  public static unsafe Vector<ulong> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ulong> op1, ulong op2);  // svbgrp[_n_u64]: BGRP

  public static unsafe Vector<sbyte> BitwiseSelect(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbsl[_s8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<short> BitwiseSelect(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbsl[_s16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<int> BitwiseSelect(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbsl[_s32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<long> BitwiseSelect(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbsl[_s64]: BSL or MOVPRFX+BSL
  public static unsafe Vector<byte> BitwiseSelect(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbsl[_u8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ushort> BitwiseSelect(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbsl[_u16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<uint> BitwiseSelect(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbsl[_u32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ulong> BitwiseSelect(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbsl[_u64]: BSL or MOVPRFX+BSL
  public static unsafe Vector<sbyte> BitwiseSelect(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbsl[_n_s8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<short> BitwiseSelect(Vector<short> op1, Vector<short> op2, short op3);  // svbsl[_n_s16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<int> BitwiseSelect(Vector<int> op1, Vector<int> op2, int op3);  // svbsl[_n_s32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<long> BitwiseSelect(Vector<long> op1, Vector<long> op2, long op3);  // svbsl[_n_s64]: BSL or MOVPRFX+BSL
  public static unsafe Vector<byte> BitwiseSelect(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbsl[_n_u8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ushort> BitwiseSelect(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbsl[_n_u16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<uint> BitwiseSelect(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbsl[_n_u32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ulong> BitwiseSelect(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbsl[_n_u64]: BSL or MOVPRFX+BSL

  public static unsafe Vector<sbyte> BitwiseSelectWithFirstInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbsl1n[_s8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<short> BitwiseSelectWithFirstInputInverted(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbsl1n[_s16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<int> BitwiseSelectWithFirstInputInverted(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbsl1n[_s32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<long> BitwiseSelectWithFirstInputInverted(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbsl1n[_s64]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<byte> BitwiseSelectWithFirstInputInverted(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbsl1n[_u8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ushort> BitwiseSelectWithFirstInputInverted(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbsl1n[_u16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<uint> BitwiseSelectWithFirstInputInverted(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbsl1n[_u32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ulong> BitwiseSelectWithFirstInputInverted(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbsl1n[_u64]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<sbyte> BitwiseSelectWithFirstInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbsl1n[_n_s8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<short> BitwiseSelectWithFirstInputInverted(Vector<short> op1, Vector<short> op2, short op3);  // svbsl1n[_n_s16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<int> BitwiseSelectWithFirstInputInverted(Vector<int> op1, Vector<int> op2, int op3);  // svbsl1n[_n_s32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<long> BitwiseSelectWithFirstInputInverted(Vector<long> op1, Vector<long> op2, long op3);  // svbsl1n[_n_s64]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<byte> BitwiseSelectWithFirstInputInverted(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbsl1n[_n_u8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ushort> BitwiseSelectWithFirstInputInverted(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbsl1n[_n_u16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<uint> BitwiseSelectWithFirstInputInverted(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbsl1n[_n_u32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ulong> BitwiseSelectWithFirstInputInverted(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbsl1n[_n_u64]: BSL1N or MOVPRFX+BSL1N

  public static unsafe Vector<sbyte> BitwiseSelectWithSecondInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbsl2n[_s8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<short> BitwiseSelectWithSecondInputInverted(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbsl2n[_s16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<int> BitwiseSelectWithSecondInputInverted(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbsl2n[_s32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<long> BitwiseSelectWithSecondInputInverted(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbsl2n[_s64]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<byte> BitwiseSelectWithSecondInputInverted(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbsl2n[_u8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ushort> BitwiseSelectWithSecondInputInverted(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbsl2n[_u16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<uint> BitwiseSelectWithSecondInputInverted(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbsl2n[_u32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ulong> BitwiseSelectWithSecondInputInverted(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbsl2n[_u64]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<sbyte> BitwiseSelectWithSecondInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbsl2n[_n_s8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<short> BitwiseSelectWithSecondInputInverted(Vector<short> op1, Vector<short> op2, short op3);  // svbsl2n[_n_s16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<int> BitwiseSelectWithSecondInputInverted(Vector<int> op1, Vector<int> op2, int op3);  // svbsl2n[_n_s32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<long> BitwiseSelectWithSecondInputInverted(Vector<long> op1, Vector<long> op2, long op3);  // svbsl2n[_n_s64]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<byte> BitwiseSelectWithSecondInputInverted(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbsl2n[_n_u8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ushort> BitwiseSelectWithSecondInputInverted(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbsl2n[_n_u16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<uint> BitwiseSelectWithSecondInputInverted(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbsl2n[_n_u32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ulong> BitwiseSelectWithSecondInputInverted(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbsl2n[_n_u64]: BSL2N or MOVPRFX+BSL2N

  public static unsafe Vector<sbyte> ComplexAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, ulong imm_rotation);  // svcadd[_s8]: CADD or MOVPRFX+CADD
  public static unsafe Vector<short> ComplexAddWithRotate(Vector<short> op1, Vector<short> op2, ulong imm_rotation);  // svcadd[_s16]: CADD or MOVPRFX+CADD
  public static unsafe Vector<int> ComplexAddWithRotate(Vector<int> op1, Vector<int> op2, ulong imm_rotation);  // svcadd[_s32]: CADD or MOVPRFX+CADD
  public static unsafe Vector<long> ComplexAddWithRotate(Vector<long> op1, Vector<long> op2, ulong imm_rotation);  // svcadd[_s64]: CADD or MOVPRFX+CADD
  public static unsafe Vector<byte> ComplexAddWithRotate(Vector<byte> op1, Vector<byte> op2, ulong imm_rotation);  // svcadd[_u8]: CADD or MOVPRFX+CADD
  public static unsafe Vector<ushort> ComplexAddWithRotate(Vector<ushort> op1, Vector<ushort> op2, ulong imm_rotation);  // svcadd[_u16]: CADD or MOVPRFX+CADD
  public static unsafe Vector<uint> ComplexAddWithRotate(Vector<uint> op1, Vector<uint> op2, ulong imm_rotation);  // svcadd[_u32]: CADD or MOVPRFX+CADD
  public static unsafe Vector<ulong> ComplexAddWithRotate(Vector<ulong> op1, Vector<ulong> op2, ulong imm_rotation);  // svcadd[_u64]: CADD or MOVPRFX+CADD

  public static unsafe Vector<int> ComplexDotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_rotation);  // svcdot[_s32]: CDOT or MOVPRFX+CDOT
  public static unsafe Vector<long> ComplexDotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3, ulong imm_rotation);  // svcdot[_s64]: CDOT or MOVPRFX+CDOT
  public static unsafe Vector<int> ComplexDotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_index, ulong imm_rotation);  // svcdot_lane[_s32]: CDOT or MOVPRFX+CDOT
  public static unsafe Vector<long> ComplexDotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3, ulong imm_index, ulong imm_rotation);  // svcdot_lane[_s64]: CDOT or MOVPRFX+CDOT

  public static unsafe Vector<sbyte> ComplexMultiplyAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_rotation);  // svcmla[_s8]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<short> ComplexMultiplyAddWithRotate(Vector<short> op1, Vector<short> op2, Vector<short> op3, ulong imm_rotation);  // svcmla[_s16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<int> ComplexMultiplyAddWithRotate(Vector<int> op1, Vector<int> op2, Vector<int> op3, ulong imm_rotation);  // svcmla[_s32]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<long> ComplexMultiplyAddWithRotate(Vector<long> op1, Vector<long> op2, Vector<long> op3, ulong imm_rotation);  // svcmla[_s64]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<byte> ComplexMultiplyAddWithRotate(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3, ulong imm_rotation);  // svcmla[_u8]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<ushort> ComplexMultiplyAddWithRotate(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3, ulong imm_rotation);  // svcmla[_u16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<uint> ComplexMultiplyAddWithRotate(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3, ulong imm_rotation);  // svcmla[_u32]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<ulong> ComplexMultiplyAddWithRotate(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3, ulong imm_rotation);  // svcmla[_u64]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<float> ComplexMultiplyAddWithRotate(Vector<float> op1, Vector<float> op2, Vector<float> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_f32]: FCMLA or MOVPRFX+FCMLA
  public static unsafe Vector<short> ComplexMultiplyAddWithRotate(Vector<short> op1, Vector<short> op2, Vector<short> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_s16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<int> ComplexMultiplyAddWithRotate(Vector<int> op1, Vector<int> op2, Vector<int> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_s32]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<ushort> ComplexMultiplyAddWithRotate(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_u16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<uint> ComplexMultiplyAddWithRotate(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_u32]: CMLA or MOVPRFX+CMLA

  public static unsafe ulong CountTheNumberOf8BitElementsInAVector();  // svcntb: CNTB
  public static unsafe ulong CountTheNumberOf8BitElementsInAVector(enum svpattern pattern);  // svcntb_pat: CNTB

  public static unsafe ulong CountTheNumberOf64BitElementsInAVector();  // svcntd: CNTD
  public static unsafe ulong CountTheNumberOf64BitElementsInAVector(enum svpattern pattern);  // svcntd_pat: CNTD

  public static unsafe ulong CountTheNumberOf16BitElementsInAVector();  // svcnth: CNTH
  public static unsafe ulong CountTheNumberOf16BitElementsInAVector(enum svpattern pattern);  // svcnth_pat: CNTH

  public static unsafe ulong CountTheNumberOf32BitElementsInAVector();  // svcntw: CNTW
  public static unsafe ulong CountTheNumberOf32BitElementsInAVector(enum svpattern pattern);  // svcntw_pat: CNTW

  public static unsafe Vector<int> DotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svdot[_s32]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<long> DotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3);  // svdot[_s64]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<uint> DotProduct(Vector<uint> op1, Vector<byte> op2, Vector<byte> op3);  // svdot[_u32]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<ulong> DotProduct(Vector<ulong> op1, Vector<ushort> op2, Vector<ushort> op3);  // svdot[_u64]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<int> DotProduct(Vector<int> op1, Vector<sbyte> op2, sbyte op3);  // svdot[_n_s32]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<long> DotProduct(Vector<long> op1, Vector<short> op2, short op3);  // svdot[_n_s64]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<uint> DotProduct(Vector<uint> op1, Vector<byte> op2, byte op3);  // svdot[_n_u32]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<ulong> DotProduct(Vector<ulong> op1, Vector<ushort> op2, ushort op3);  // svdot[_n_u64]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<int> DotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_index);  // svdot_lane[_s32]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<long> DotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3, ulong imm_index);  // svdot_lane[_s64]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<uint> DotProduct(Vector<uint> op1, Vector<byte> op2, Vector<byte> op3, ulong imm_index);  // svdot_lane[_u32]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<ulong> DotProduct(Vector<ulong> op1, Vector<ushort> op2, Vector<ushort> op3, ulong imm_index);  // svdot_lane[_u64]: UDOT or MOVPRFX+UDOT

  public static unsafe Vector<float> BroadcastAScalarValue(float op);  // svdup[_n]_f32: DUP or FDUP or DUP or DUP
  public static unsafe Vector<double> BroadcastAScalarValue(double op);  // svdup[_n]_f64: DUP or FDUP or DUP or DUP
  public static unsafe Vector<sbyte> BroadcastAScalarValue(sbyte op);  // svdup[_n]_s8: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<short> BroadcastAScalarValue(short op);  // svdup[_n]_s16: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<int> BroadcastAScalarValue(int op);  // svdup[_n]_s32: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<long> BroadcastAScalarValue(long op);  // svdup[_n]_s64: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<byte> BroadcastAScalarValue(byte op);  // svdup[_n]_u8: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<ushort> BroadcastAScalarValue(ushort op);  // svdup[_n]_u16: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<uint> BroadcastAScalarValue(uint op);  // svdup[_n]_u32: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<ulong> BroadcastAScalarValue(ulong op);  // svdup[_n]_u64: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<float> BroadcastAScalarValue(Vector<float> data, uint index);  // svdup_lane[_f32]: DUP or TBL
  public static unsafe Vector<double> BroadcastAScalarValue(Vector<double> data, ulong index);  // svdup_lane[_f64]: DUP or TBL
  public static unsafe Vector<sbyte> BroadcastAScalarValue(Vector<sbyte> data, byte index);  // svdup_lane[_s8]: DUP or TBL
  public static unsafe Vector<short> BroadcastAScalarValue(Vector<short> data, ushort index);  // svdup_lane[_s16]: DUP or TBL
  public static unsafe Vector<int> BroadcastAScalarValue(Vector<int> data, uint index);  // svdup_lane[_s32]: DUP or TBL
  public static unsafe Vector<long> BroadcastAScalarValue(Vector<long> data, ulong index);  // svdup_lane[_s64]: DUP or TBL
  public static unsafe Vector<byte> BroadcastAScalarValue(Vector<byte> data, byte index);  // svdup_lane[_u8]: DUP or TBL
  public static unsafe Vector<ushort> BroadcastAScalarValue(Vector<ushort> data, ushort index);  // svdup_lane[_u16]: DUP or TBL
  public static unsafe Vector<uint> BroadcastAScalarValue(Vector<uint> data, uint index);  // svdup_lane[_u32]: DUP or TBL
  public static unsafe Vector<ulong> BroadcastAScalarValue(Vector<ulong> data, ulong index);  // svdup_lane[_u64]: DUP or TBL

  public static unsafe Vector<sbyte> BitwiseExclusiveOrOfThreeVectors(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // sveor3[_s8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<short> BitwiseExclusiveOrOfThreeVectors(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // sveor3[_s16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<int> BitwiseExclusiveOrOfThreeVectors(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // sveor3[_s32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<long> BitwiseExclusiveOrOfThreeVectors(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // sveor3[_s64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<byte> BitwiseExclusiveOrOfThreeVectors(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // sveor3[_u8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ushort> BitwiseExclusiveOrOfThreeVectors(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // sveor3[_u16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<uint> BitwiseExclusiveOrOfThreeVectors(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // sveor3[_u32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ulong> BitwiseExclusiveOrOfThreeVectors(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // sveor3[_u64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<sbyte> BitwiseExclusiveOrOfThreeVectors(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // sveor3[_n_s8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<short> BitwiseExclusiveOrOfThreeVectors(Vector<short> op1, Vector<short> op2, short op3);  // sveor3[_n_s16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<int> BitwiseExclusiveOrOfThreeVectors(Vector<int> op1, Vector<int> op2, int op3);  // sveor3[_n_s32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<long> BitwiseExclusiveOrOfThreeVectors(Vector<long> op1, Vector<long> op2, long op3);  // sveor3[_n_s64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<byte> BitwiseExclusiveOrOfThreeVectors(Vector<byte> op1, Vector<byte> op2, byte op3);  // sveor3[_n_u8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ushort> BitwiseExclusiveOrOfThreeVectors(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // sveor3[_n_u16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<uint> BitwiseExclusiveOrOfThreeVectors(Vector<uint> op1, Vector<uint> op2, uint op3);  // sveor3[_n_u32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ulong> BitwiseExclusiveOrOfThreeVectors(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // sveor3[_n_u64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3

  public static unsafe Vector<sbyte> InterleavingExclusiveOrBottomTop(Vector<sbyte> odd, Vector<sbyte> op1, Vector<sbyte> op2);  // sveorbt[_s8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<short> InterleavingExclusiveOrBottomTop(Vector<short> odd, Vector<short> op1, Vector<short> op2);  // sveorbt[_s16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<int> InterleavingExclusiveOrBottomTop(Vector<int> odd, Vector<int> op1, Vector<int> op2);  // sveorbt[_s32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<long> InterleavingExclusiveOrBottomTop(Vector<long> odd, Vector<long> op1, Vector<long> op2);  // sveorbt[_s64]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<byte> InterleavingExclusiveOrBottomTop(Vector<byte> odd, Vector<byte> op1, Vector<byte> op2);  // sveorbt[_u8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ushort> InterleavingExclusiveOrBottomTop(Vector<ushort> odd, Vector<ushort> op1, Vector<ushort> op2);  // sveorbt[_u16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<uint> InterleavingExclusiveOrBottomTop(Vector<uint> odd, Vector<uint> op1, Vector<uint> op2);  // sveorbt[_u32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ulong> InterleavingExclusiveOrBottomTop(Vector<ulong> odd, Vector<ulong> op1, Vector<ulong> op2);  // sveorbt[_u64]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<sbyte> InterleavingExclusiveOrBottomTop(Vector<sbyte> odd, Vector<sbyte> op1, sbyte op2);  // sveorbt[_n_s8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<short> InterleavingExclusiveOrBottomTop(Vector<short> odd, Vector<short> op1, short op2);  // sveorbt[_n_s16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<int> InterleavingExclusiveOrBottomTop(Vector<int> odd, Vector<int> op1, int op2);  // sveorbt[_n_s32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<long> InterleavingExclusiveOrBottomTop(Vector<long> odd, Vector<long> op1, long op2);  // sveorbt[_n_s64]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<byte> InterleavingExclusiveOrBottomTop(Vector<byte> odd, Vector<byte> op1, byte op2);  // sveorbt[_n_u8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ushort> InterleavingExclusiveOrBottomTop(Vector<ushort> odd, Vector<ushort> op1, ushort op2);  // sveorbt[_n_u16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<uint> InterleavingExclusiveOrBottomTop(Vector<uint> odd, Vector<uint> op1, uint op2);  // sveorbt[_n_u32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ulong> InterleavingExclusiveOrBottomTop(Vector<ulong> odd, Vector<ulong> op1, ulong op2);  // sveorbt[_n_u64]: EORBT or MOVPRFX+EORBT

  public static unsafe Vector<sbyte> InterleavingExclusiveOrTopBottom(Vector<sbyte> even, Vector<sbyte> op1, Vector<sbyte> op2);  // sveortb[_s8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<short> InterleavingExclusiveOrTopBottom(Vector<short> even, Vector<short> op1, Vector<short> op2);  // sveortb[_s16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<int> InterleavingExclusiveOrTopBottom(Vector<int> even, Vector<int> op1, Vector<int> op2);  // sveortb[_s32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<long> InterleavingExclusiveOrTopBottom(Vector<long> even, Vector<long> op1, Vector<long> op2);  // sveortb[_s64]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<byte> InterleavingExclusiveOrTopBottom(Vector<byte> even, Vector<byte> op1, Vector<byte> op2);  // sveortb[_u8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ushort> InterleavingExclusiveOrTopBottom(Vector<ushort> even, Vector<ushort> op1, Vector<ushort> op2);  // sveortb[_u16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<uint> InterleavingExclusiveOrTopBottom(Vector<uint> even, Vector<uint> op1, Vector<uint> op2);  // sveortb[_u32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ulong> InterleavingExclusiveOrTopBottom(Vector<ulong> even, Vector<ulong> op1, Vector<ulong> op2);  // sveortb[_u64]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<sbyte> InterleavingExclusiveOrTopBottom(Vector<sbyte> even, Vector<sbyte> op1, sbyte op2);  // sveortb[_n_s8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<short> InterleavingExclusiveOrTopBottom(Vector<short> even, Vector<short> op1, short op2);  // sveortb[_n_s16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<int> InterleavingExclusiveOrTopBottom(Vector<int> even, Vector<int> op1, int op2);  // sveortb[_n_s32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<long> InterleavingExclusiveOrTopBottom(Vector<long> even, Vector<long> op1, long op2);  // sveortb[_n_s64]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<byte> InterleavingExclusiveOrTopBottom(Vector<byte> even, Vector<byte> op1, byte op2);  // sveortb[_n_u8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ushort> InterleavingExclusiveOrTopBottom(Vector<ushort> even, Vector<ushort> op1, ushort op2);  // sveortb[_n_u16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<uint> InterleavingExclusiveOrTopBottom(Vector<uint> even, Vector<uint> op1, uint op2);  // sveortb[_n_u32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ulong> InterleavingExclusiveOrTopBottom(Vector<ulong> even, Vector<ulong> op1, ulong op2);  // sveortb[_n_u64]: EORTB or MOVPRFX+EORTB

  public static unsafe Vector<float> FloatingPointExponentialAccelerator(Vector<uint> op);  // svexpa[_f32]: FEXPA
  public static unsafe Vector<double> FloatingPointExponentialAccelerator(Vector<ulong> op);  // svexpa[_f64]: FEXPA

  public static unsafe Vector<float> ExtractVectorFromPairOfVectors(Vector<float> op1, Vector<float> op2, ulong imm3);  // svext[_f32]: EXT or MOVPRFX+EXT
  public static unsafe Vector<double> ExtractVectorFromPairOfVectors(Vector<double> op1, Vector<double> op2, ulong imm3);  // svext[_f64]: EXT or MOVPRFX+EXT
  public static unsafe Vector<sbyte> ExtractVectorFromPairOfVectors(Vector<sbyte> op1, Vector<sbyte> op2, ulong imm3);  // svext[_s8]: EXT or MOVPRFX+EXT
  public static unsafe Vector<short> ExtractVectorFromPairOfVectors(Vector<short> op1, Vector<short> op2, ulong imm3);  // svext[_s16]: EXT or MOVPRFX+EXT
  public static unsafe Vector<int> ExtractVectorFromPairOfVectors(Vector<int> op1, Vector<int> op2, ulong imm3);  // svext[_s32]: EXT or MOVPRFX+EXT
  public static unsafe Vector<long> ExtractVectorFromPairOfVectors(Vector<long> op1, Vector<long> op2, ulong imm3);  // svext[_s64]: EXT or MOVPRFX+EXT
  public static unsafe Vector<byte> ExtractVectorFromPairOfVectors(Vector<byte> op1, Vector<byte> op2, ulong imm3);  // svext[_u8]: EXT or MOVPRFX+EXT
  public static unsafe Vector<ushort> ExtractVectorFromPairOfVectors(Vector<ushort> op1, Vector<ushort> op2, ulong imm3);  // svext[_u16]: EXT or MOVPRFX+EXT
  public static unsafe Vector<uint> ExtractVectorFromPairOfVectors(Vector<uint> op1, Vector<uint> op2, ulong imm3);  // svext[_u32]: EXT or MOVPRFX+EXT
  public static unsafe Vector<ulong> ExtractVectorFromPairOfVectors(Vector<ulong> op1, Vector<ulong> op2, ulong imm3);  // svext[_u64]: EXT or MOVPRFX+EXT

  public static unsafe Vector<byte> CountMatchingElementsIn128BitSegments(Vector<sbyte> op1, Vector<sbyte> op2);  // svhistseg[_s8]: HISTSEG
  public static unsafe Vector<byte> CountMatchingElementsIn128BitSegments(Vector<byte> op1, Vector<byte> op2);  // svhistseg[_u8]: HISTSEG

  public static unsafe Vector<sbyte> CreateLinearSeries(sbyte base, sbyte step);  // svindex_s8: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<short> CreateLinearSeries(short base, short step);  // svindex_s16: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<int> CreateLinearSeries(int base, int step);  // svindex_s32: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<long> CreateLinearSeries(long base, long step);  // svindex_s64: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<byte> CreateLinearSeries(byte base, byte step);  // svindex_u8: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<ushort> CreateLinearSeries(ushort base, ushort step);  // svindex_u16: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<uint> CreateLinearSeries(uint base, uint step);  // svindex_u32: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<ulong> CreateLinearSeries(ulong base, ulong step);  // svindex_u64: INDEX or INDEX or INDEX or INDEX

  public static unsafe Vector<float> InsertScalarIntoShiftedVector(Vector<float> op1, float op2);  // svinsr[_n_f32]: INSR or INSR
  public static unsafe Vector<double> InsertScalarIntoShiftedVector(Vector<double> op1, double op2);  // svinsr[_n_f64]: INSR or INSR
  public static unsafe Vector<sbyte> InsertScalarIntoShiftedVector(Vector<sbyte> op1, sbyte op2);  // svinsr[_n_s8]: INSR or INSR
  public static unsafe Vector<short> InsertScalarIntoShiftedVector(Vector<short> op1, short op2);  // svinsr[_n_s16]: INSR or INSR
  public static unsafe Vector<int> InsertScalarIntoShiftedVector(Vector<int> op1, int op2);  // svinsr[_n_s32]: INSR or INSR
  public static unsafe Vector<long> InsertScalarIntoShiftedVector(Vector<long> op1, long op2);  // svinsr[_n_s64]: INSR or INSR
  public static unsafe Vector<byte> InsertScalarIntoShiftedVector(Vector<byte> op1, byte op2);  // svinsr[_n_u8]: INSR or INSR
  public static unsafe Vector<ushort> InsertScalarIntoShiftedVector(Vector<ushort> op1, ushort op2);  // svinsr[_n_u16]: INSR or INSR
  public static unsafe Vector<uint> InsertScalarIntoShiftedVector(Vector<uint> op1, uint op2);  // svinsr[_n_u32]: INSR or INSR
  public static unsafe Vector<ulong> InsertScalarIntoShiftedVector(Vector<ulong> op1, ulong op2);  // svinsr[_n_u64]: INSR or INSR

  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<float> op);  // svlen[_f32]: CNTW
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<double> op);  // svlen[_f64]: CNTD
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<sbyte> op);  // svlen[_s8]: CNTB
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<short> op);  // svlen[_s16]: CNTH
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<int> op);  // svlen[_s32]: CNTW
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<long> op);  // svlen[_s64]: CNTD
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<byte> op);  // svlen[_u8]: CNTB
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<ushort> op);  // svlen[_u16]: CNTH
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<uint> op);  // svlen[_u32]: CNTW
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<ulong> op);  // svlen[_u64]: CNTD

API Usage

var x = ArmSVE.ReverseAllElements(y);

Vector<uint> a = ArmSVE.ComplexAddWithRotate(b, c, 270);

vector<T> myfunc (Vector<T> i, Vector<T> j, Vector<T> k) {
  return ArmSVE.AbsoluteDifferenceAndAccumulate(i,j,k);
}

Alternative Designs

Some of the methods take in a rotation. eg:

  public static unsafe Vector<sbyte> ComplexAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, ulong imm_rotation);  // svcadd[_s8]: CADD or MOVPRFX+CADD

This rotation must be 90 or 270. We may want to improve it to:

  public static unsafe Vector<sbyte> ComplexAddWithRotate90(Vector<sbyte> op1, Vector<sbyte> op2);  // svcadd[_s8]: CADD or MOVPRFX+CADD
  public static unsafe Vector<sbyte> ComplexAddWithRotate270(Vector<sbyte> op1, Vector<sbyte> op2);  // svcadd[_s8]: CADD or MOVPRFX+CADD

or

  public static unsafe Vector<sbyte> ComplexAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, bool rotate270);  // svcadd[_s8]: CADD or MOVPRFX+CADD

Risks

The C intrinsics are well specified. Starting from this existing list will help to reduce errors. However, I expect the C# API to deviate during design review.

It is possible that there are methods for instructions that are not yet in existing hardware. We should prune those from the list as they are found.

@a74nh a74nh added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 13, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 13, 2023
@ghost
Copy link

ghost commented Oct 13, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

413 methods (of 1229)

This API covers all the SVE/SVE2 instructions that don't require predication. Split into multiple parts to fit within github issue max size.

This list only includes SVE instructions currently available in existing hardware, ie: FEAT_SVE and FEAT_SVE2. No other SVE extensions are covered.
List of SVE instructions

This list was auto generated from the C ACLE intrincs for SVE.
SVE intrinsics doc
Interactive list of SVE intrincs
In the following way:

  • Generate complete list of methods. All methods names are created from camelcasing the description.
  • Remove all methods that use float16 or bfloat16. C# does not yet support 16bit float.
  • Remove all methods that use QuadWord. This is FEAT_F64MM.
  • Remove all methods that use predicates. These will be covered in a future API suggestion.
  • Remove all methods without an underlying SVE instruction. These will be covered in a future API suggestion.

For each method I've included the original C instrinsic (eg svaba[_s8]) and the SVE instruction generated (eg SABA). For many methods 2 instructions may be generated due to register usage (eg MOVPRFX+SABA when the result is in a different register from the inputs).

Contributes towards #93095

API Proposal

namespace System.Runtime.Intrinsics.Arm

public class ArmSVE<T>
{

  public static unsafe Vector<sbyte> AbsoluteDifferenceAndAccumulate(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svaba[_s8]: SABA or MOVPRFX+SABA
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulate(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svaba[_s16]: SABA or MOVPRFX+SABA
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulate(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svaba[_s32]: SABA or MOVPRFX+SABA
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulate(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svaba[_s64]: SABA or MOVPRFX+SABA
  public static unsafe Vector<byte> AbsoluteDifferenceAndAccumulate(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svaba[_u8]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulate(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svaba[_u16]: UABA or MOVPRFX+UABA
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulate(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svaba[_u32]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulate(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svaba[_u64]: UABA or MOVPRFX+UABA
  public static unsafe Vector<sbyte> AbsoluteDifferenceAndAccumulate(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svaba[_n_s8]: SABA or MOVPRFX+SABA
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulate(Vector<short> op1, Vector<short> op2, short op3);  // svaba[_n_s16]: SABA or MOVPRFX+SABA
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulate(Vector<int> op1, Vector<int> op2, int op3);  // svaba[_n_s32]: SABA or MOVPRFX+SABA
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulate(Vector<long> op1, Vector<long> op2, long op3);  // svaba[_n_s64]: SABA or MOVPRFX+SABA
  public static unsafe Vector<byte> AbsoluteDifferenceAndAccumulate(Vector<byte> op1, Vector<byte> op2, byte op3);  // svaba[_n_u8]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulate(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svaba[_n_u16]: UABA or MOVPRFX+UABA
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulate(Vector<uint> op1, Vector<uint> op2, uint op3);  // svaba[_n_u32]: UABA or MOVPRFX+UABA
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulate(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svaba[_n_u64]: UABA or MOVPRFX+UABA

  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongBottom(Vector<short> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svabalb[_s16]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongBottom(Vector<int> op1, Vector<short> op2, Vector<short> op3);  // svabalb[_s32]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongBottom(Vector<long> op1, Vector<int> op2, Vector<int> op3);  // svabalb[_s64]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ushort> op1, Vector<byte> op2, Vector<byte> op3);  // svabalb[_u16]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongBottom(Vector<uint> op1, Vector<ushort> op2, Vector<ushort> op3);  // svabalb[_u32]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ulong> op1, Vector<uint> op2, Vector<uint> op3);  // svabalb[_u64]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongBottom(Vector<short> op1, Vector<sbyte> op2, sbyte op3);  // svabalb[_n_s16]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongBottom(Vector<int> op1, Vector<short> op2, short op3);  // svabalb[_n_s32]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongBottom(Vector<long> op1, Vector<int> op2, int op3);  // svabalb[_n_s64]: SABALB or MOVPRFX+SABALB
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ushort> op1, Vector<byte> op2, byte op3);  // svabalb[_n_u16]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongBottom(Vector<uint> op1, Vector<ushort> op2, ushort op3);  // svabalb[_n_u32]: UABALB or MOVPRFX+UABALB
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongBottom(Vector<ulong> op1, Vector<uint> op2, uint op3);  // svabalb[_n_u64]: UABALB or MOVPRFX+UABALB

  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongTop(Vector<short> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svabalt[_s16]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongTop(Vector<int> op1, Vector<short> op2, Vector<short> op3);  // svabalt[_s32]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongTop(Vector<long> op1, Vector<int> op2, Vector<int> op3);  // svabalt[_s64]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongTop(Vector<ushort> op1, Vector<byte> op2, Vector<byte> op3);  // svabalt[_u16]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongTop(Vector<uint> op1, Vector<ushort> op2, Vector<ushort> op3);  // svabalt[_u32]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongTop(Vector<ulong> op1, Vector<uint> op2, Vector<uint> op3);  // svabalt[_u64]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<short> AbsoluteDifferenceAndAccumulateLongTop(Vector<short> op1, Vector<sbyte> op2, sbyte op3);  // svabalt[_n_s16]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<int> AbsoluteDifferenceAndAccumulateLongTop(Vector<int> op1, Vector<short> op2, short op3);  // svabalt[_n_s32]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<long> AbsoluteDifferenceAndAccumulateLongTop(Vector<long> op1, Vector<int> op2, int op3);  // svabalt[_n_s64]: SABALT or MOVPRFX+SABALT
  public static unsafe Vector<ushort> AbsoluteDifferenceAndAccumulateLongTop(Vector<ushort> op1, Vector<byte> op2, byte op3);  // svabalt[_n_u16]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<uint> AbsoluteDifferenceAndAccumulateLongTop(Vector<uint> op1, Vector<ushort> op2, ushort op3);  // svabalt[_n_u32]: UABALT or MOVPRFX+UABALT
  public static unsafe Vector<ulong> AbsoluteDifferenceAndAccumulateLongTop(Vector<ulong> op1, Vector<uint> op2, uint op3);  // svabalt[_n_u64]: UABALT or MOVPRFX+UABALT

  public static unsafe Vector<short> AbsoluteDifferenceLongBottom(Vector<sbyte> op1, Vector<sbyte> op2);  // svabdlb[_s16]: SABDLB
  public static unsafe Vector<int> AbsoluteDifferenceLongBottom(Vector<short> op1, Vector<short> op2);  // svabdlb[_s32]: SABDLB
  public static unsafe Vector<long> AbsoluteDifferenceLongBottom(Vector<int> op1, Vector<int> op2);  // svabdlb[_s64]: SABDLB
  public static unsafe Vector<ushort> AbsoluteDifferenceLongBottom(Vector<byte> op1, Vector<byte> op2);  // svabdlb[_u16]: UABDLB
  public static unsafe Vector<uint> AbsoluteDifferenceLongBottom(Vector<ushort> op1, Vector<ushort> op2);  // svabdlb[_u32]: UABDLB
  public static unsafe Vector<ulong> AbsoluteDifferenceLongBottom(Vector<uint> op1, Vector<uint> op2);  // svabdlb[_u64]: UABDLB
  public static unsafe Vector<short> AbsoluteDifferenceLongBottom(Vector<sbyte> op1, sbyte op2);  // svabdlb[_n_s16]: SABDLB
  public static unsafe Vector<int> AbsoluteDifferenceLongBottom(Vector<short> op1, short op2);  // svabdlb[_n_s32]: SABDLB
  public static unsafe Vector<long> AbsoluteDifferenceLongBottom(Vector<int> op1, int op2);  // svabdlb[_n_s64]: SABDLB
  public static unsafe Vector<ushort> AbsoluteDifferenceLongBottom(Vector<byte> op1, byte op2);  // svabdlb[_n_u16]: UABDLB
  public static unsafe Vector<uint> AbsoluteDifferenceLongBottom(Vector<ushort> op1, ushort op2);  // svabdlb[_n_u32]: UABDLB
  public static unsafe Vector<ulong> AbsoluteDifferenceLongBottom(Vector<uint> op1, uint op2);  // svabdlb[_n_u64]: UABDLB

  public static unsafe Vector<short> AbsoluteDifferenceLongTop(Vector<sbyte> op1, Vector<sbyte> op2);  // svabdlt[_s16]: SABDLT
  public static unsafe Vector<int> AbsoluteDifferenceLongTop(Vector<short> op1, Vector<short> op2);  // svabdlt[_s32]: SABDLT
  public static unsafe Vector<long> AbsoluteDifferenceLongTop(Vector<int> op1, Vector<int> op2);  // svabdlt[_s64]: SABDLT
  public static unsafe Vector<ushort> AbsoluteDifferenceLongTop(Vector<byte> op1, Vector<byte> op2);  // svabdlt[_u16]: UABDLT
  public static unsafe Vector<uint> AbsoluteDifferenceLongTop(Vector<ushort> op1, Vector<ushort> op2);  // svabdlt[_u32]: UABDLT
  public static unsafe Vector<ulong> AbsoluteDifferenceLongTop(Vector<uint> op1, Vector<uint> op2);  // svabdlt[_u64]: UABDLT
  public static unsafe Vector<short> AbsoluteDifferenceLongTop(Vector<sbyte> op1, sbyte op2);  // svabdlt[_n_s16]: SABDLT
  public static unsafe Vector<int> AbsoluteDifferenceLongTop(Vector<short> op1, short op2);  // svabdlt[_n_s32]: SABDLT
  public static unsafe Vector<long> AbsoluteDifferenceLongTop(Vector<int> op1, int op2);  // svabdlt[_n_s64]: SABDLT
  public static unsafe Vector<ushort> AbsoluteDifferenceLongTop(Vector<byte> op1, byte op2);  // svabdlt[_n_u16]: UABDLT
  public static unsafe Vector<uint> AbsoluteDifferenceLongTop(Vector<ushort> op1, ushort op2);  // svabdlt[_n_u32]: UABDLT
  public static unsafe Vector<ulong> AbsoluteDifferenceLongTop(Vector<uint> op1, uint op2);  // svabdlt[_n_u64]: UABDLT

  public static unsafe Vector<uint> AddWithCarryLongBottom(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svadclb[_u32]: ADCLB or MOVPRFX+ADCLB
  public static unsafe Vector<ulong> AddWithCarryLongBottom(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svadclb[_u64]: ADCLB or MOVPRFX+ADCLB
  public static unsafe Vector<uint> AddWithCarryLongBottom(Vector<uint> op1, Vector<uint> op2, uint op3);  // svadclb[_n_u32]: ADCLB or MOVPRFX+ADCLB
  public static unsafe Vector<ulong> AddWithCarryLongBottom(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svadclb[_n_u64]: ADCLB or MOVPRFX+ADCLB

  public static unsafe Vector<uint> AddWithCarryLongTop(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svadclt[_u32]: ADCLT or MOVPRFX+ADCLT
  public static unsafe Vector<ulong> AddWithCarryLongTop(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svadclt[_u64]: ADCLT or MOVPRFX+ADCLT
  public static unsafe Vector<uint> AddWithCarryLongTop(Vector<uint> op1, Vector<uint> op2, uint op3);  // svadclt[_n_u32]: ADCLT or MOVPRFX+ADCLT
  public static unsafe Vector<ulong> AddWithCarryLongTop(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svadclt[_n_u64]: ADCLT or MOVPRFX+ADCLT

  public static unsafe Vector<sbyte> AddNarrowHighPartBottom(Vector<short> op1, Vector<short> op2);  // svaddhnb[_s16]: ADDHNB
  public static unsafe Vector<short> AddNarrowHighPartBottom(Vector<int> op1, Vector<int> op2);  // svaddhnb[_s32]: ADDHNB
  public static unsafe Vector<int> AddNarrowHighPartBottom(Vector<long> op1, Vector<long> op2);  // svaddhnb[_s64]: ADDHNB
  public static unsafe Vector<byte> AddNarrowHighPartBottom(Vector<ushort> op1, Vector<ushort> op2);  // svaddhnb[_u16]: ADDHNB
  public static unsafe Vector<ushort> AddNarrowHighPartBottom(Vector<uint> op1, Vector<uint> op2);  // svaddhnb[_u32]: ADDHNB
  public static unsafe Vector<uint> AddNarrowHighPartBottom(Vector<ulong> op1, Vector<ulong> op2);  // svaddhnb[_u64]: ADDHNB
  public static unsafe Vector<sbyte> AddNarrowHighPartBottom(Vector<short> op1, short op2);  // svaddhnb[_n_s16]: ADDHNB
  public static unsafe Vector<short> AddNarrowHighPartBottom(Vector<int> op1, int op2);  // svaddhnb[_n_s32]: ADDHNB
  public static unsafe Vector<int> AddNarrowHighPartBottom(Vector<long> op1, long op2);  // svaddhnb[_n_s64]: ADDHNB
  public static unsafe Vector<byte> AddNarrowHighPartBottom(Vector<ushort> op1, ushort op2);  // svaddhnb[_n_u16]: ADDHNB
  public static unsafe Vector<ushort> AddNarrowHighPartBottom(Vector<uint> op1, uint op2);  // svaddhnb[_n_u32]: ADDHNB
  public static unsafe Vector<uint> AddNarrowHighPartBottom(Vector<ulong> op1, ulong op2);  // svaddhnb[_n_u64]: ADDHNB

  public static unsafe Vector<sbyte> AddNarrowHighPartTop(Vector<sbyte> even, Vector<short> op1, Vector<short> op2);  // svaddhnt[_s16]: ADDHNT
  public static unsafe Vector<short> AddNarrowHighPartTop(Vector<short> even, Vector<int> op1, Vector<int> op2);  // svaddhnt[_s32]: ADDHNT
  public static unsafe Vector<int> AddNarrowHighPartTop(Vector<int> even, Vector<long> op1, Vector<long> op2);  // svaddhnt[_s64]: ADDHNT
  public static unsafe Vector<byte> AddNarrowHighPartTop(Vector<byte> even, Vector<ushort> op1, Vector<ushort> op2);  // svaddhnt[_u16]: ADDHNT
  public static unsafe Vector<ushort> AddNarrowHighPartTop(Vector<ushort> even, Vector<uint> op1, Vector<uint> op2);  // svaddhnt[_u32]: ADDHNT
  public static unsafe Vector<uint> AddNarrowHighPartTop(Vector<uint> even, Vector<ulong> op1, Vector<ulong> op2);  // svaddhnt[_u64]: ADDHNT
  public static unsafe Vector<sbyte> AddNarrowHighPartTop(Vector<sbyte> even, Vector<short> op1, short op2);  // svaddhnt[_n_s16]: ADDHNT
  public static unsafe Vector<short> AddNarrowHighPartTop(Vector<short> even, Vector<int> op1, int op2);  // svaddhnt[_n_s32]: ADDHNT
  public static unsafe Vector<int> AddNarrowHighPartTop(Vector<int> even, Vector<long> op1, long op2);  // svaddhnt[_n_s64]: ADDHNT
  public static unsafe Vector<byte> AddNarrowHighPartTop(Vector<byte> even, Vector<ushort> op1, ushort op2);  // svaddhnt[_n_u16]: ADDHNT
  public static unsafe Vector<ushort> AddNarrowHighPartTop(Vector<ushort> even, Vector<uint> op1, uint op2);  // svaddhnt[_n_u32]: ADDHNT
  public static unsafe Vector<uint> AddNarrowHighPartTop(Vector<uint> even, Vector<ulong> op1, ulong op2);  // svaddhnt[_n_u64]: ADDHNT

  public static unsafe Vector<short> AddLongBottom(Vector<sbyte> op1, Vector<sbyte> op2);  // svaddlb[_s16]: SADDLB
  public static unsafe Vector<int> AddLongBottom(Vector<short> op1, Vector<short> op2);  // svaddlb[_s32]: SADDLB
  public static unsafe Vector<long> AddLongBottom(Vector<int> op1, Vector<int> op2);  // svaddlb[_s64]: SADDLB
  public static unsafe Vector<ushort> AddLongBottom(Vector<byte> op1, Vector<byte> op2);  // svaddlb[_u16]: UADDLB
  public static unsafe Vector<uint> AddLongBottom(Vector<ushort> op1, Vector<ushort> op2);  // svaddlb[_u32]: UADDLB
  public static unsafe Vector<ulong> AddLongBottom(Vector<uint> op1, Vector<uint> op2);  // svaddlb[_u64]: UADDLB
  public static unsafe Vector<short> AddLongBottom(Vector<sbyte> op1, sbyte op2);  // svaddlb[_n_s16]: SADDLB
  public static unsafe Vector<int> AddLongBottom(Vector<short> op1, short op2);  // svaddlb[_n_s32]: SADDLB
  public static unsafe Vector<long> AddLongBottom(Vector<int> op1, int op2);  // svaddlb[_n_s64]: SADDLB
  public static unsafe Vector<ushort> AddLongBottom(Vector<byte> op1, byte op2);  // svaddlb[_n_u16]: UADDLB
  public static unsafe Vector<uint> AddLongBottom(Vector<ushort> op1, ushort op2);  // svaddlb[_n_u32]: UADDLB
  public static unsafe Vector<ulong> AddLongBottom(Vector<uint> op1, uint op2);  // svaddlb[_n_u64]: UADDLB

  public static unsafe Vector<short> AddLongBottomTop(Vector<sbyte> op1, Vector<sbyte> op2);  // svaddlbt[_s16]: SADDLBT
  public static unsafe Vector<int> AddLongBottomTop(Vector<short> op1, Vector<short> op2);  // svaddlbt[_s32]: SADDLBT
  public static unsafe Vector<long> AddLongBottomTop(Vector<int> op1, Vector<int> op2);  // svaddlbt[_s64]: SADDLBT
  public static unsafe Vector<short> AddLongBottomTop(Vector<sbyte> op1, sbyte op2);  // svaddlbt[_n_s16]: SADDLBT
  public static unsafe Vector<int> AddLongBottomTop(Vector<short> op1, short op2);  // svaddlbt[_n_s32]: SADDLBT
  public static unsafe Vector<long> AddLongBottomTop(Vector<int> op1, int op2);  // svaddlbt[_n_s64]: SADDLBT

  public static unsafe Vector<short> AddLongTop(Vector<sbyte> op1, Vector<sbyte> op2);  // svaddlt[_s16]: SADDLT
  public static unsafe Vector<int> AddLongTop(Vector<short> op1, Vector<short> op2);  // svaddlt[_s32]: SADDLT
  public static unsafe Vector<long> AddLongTop(Vector<int> op1, Vector<int> op2);  // svaddlt[_s64]: SADDLT
  public static unsafe Vector<ushort> AddLongTop(Vector<byte> op1, Vector<byte> op2);  // svaddlt[_u16]: UADDLT
  public static unsafe Vector<uint> AddLongTop(Vector<ushort> op1, Vector<ushort> op2);  // svaddlt[_u32]: UADDLT
  public static unsafe Vector<ulong> AddLongTop(Vector<uint> op1, Vector<uint> op2);  // svaddlt[_u64]: UADDLT
  public static unsafe Vector<short> AddLongTop(Vector<sbyte> op1, sbyte op2);  // svaddlt[_n_s16]: SADDLT
  public static unsafe Vector<int> AddLongTop(Vector<short> op1, short op2);  // svaddlt[_n_s32]: SADDLT
  public static unsafe Vector<long> AddLongTop(Vector<int> op1, int op2);  // svaddlt[_n_s64]: SADDLT
  public static unsafe Vector<ushort> AddLongTop(Vector<byte> op1, byte op2);  // svaddlt[_n_u16]: UADDLT
  public static unsafe Vector<uint> AddLongTop(Vector<ushort> op1, ushort op2);  // svaddlt[_n_u32]: UADDLT
  public static unsafe Vector<ulong> AddLongTop(Vector<uint> op1, uint op2);  // svaddlt[_n_u64]: UADDLT

  public static unsafe Vector<short> AddWideBottom(Vector<short> op1, Vector<sbyte> op2);  // svaddwb[_s16]: SADDWB
  public static unsafe Vector<int> AddWideBottom(Vector<int> op1, Vector<short> op2);  // svaddwb[_s32]: SADDWB
  public static unsafe Vector<long> AddWideBottom(Vector<long> op1, Vector<int> op2);  // svaddwb[_s64]: SADDWB
  public static unsafe Vector<ushort> AddWideBottom(Vector<ushort> op1, Vector<byte> op2);  // svaddwb[_u16]: UADDWB
  public static unsafe Vector<uint> AddWideBottom(Vector<uint> op1, Vector<ushort> op2);  // svaddwb[_u32]: UADDWB
  public static unsafe Vector<ulong> AddWideBottom(Vector<ulong> op1, Vector<uint> op2);  // svaddwb[_u64]: UADDWB
  public static unsafe Vector<short> AddWideBottom(Vector<short> op1, sbyte op2);  // svaddwb[_n_s16]: SADDWB
  public static unsafe Vector<int> AddWideBottom(Vector<int> op1, short op2);  // svaddwb[_n_s32]: SADDWB
  public static unsafe Vector<long> AddWideBottom(Vector<long> op1, int op2);  // svaddwb[_n_s64]: SADDWB
  public static unsafe Vector<ushort> AddWideBottom(Vector<ushort> op1, byte op2);  // svaddwb[_n_u16]: UADDWB
  public static unsafe Vector<uint> AddWideBottom(Vector<uint> op1, ushort op2);  // svaddwb[_n_u32]: UADDWB
  public static unsafe Vector<ulong> AddWideBottom(Vector<ulong> op1, uint op2);  // svaddwb[_n_u64]: UADDWB

  public static unsafe Vector<short> AddWideTop(Vector<short> op1, Vector<sbyte> op2);  // svaddwt[_s16]: SADDWT
  public static unsafe Vector<int> AddWideTop(Vector<int> op1, Vector<short> op2);  // svaddwt[_s32]: SADDWT
  public static unsafe Vector<long> AddWideTop(Vector<long> op1, Vector<int> op2);  // svaddwt[_s64]: SADDWT
  public static unsafe Vector<ushort> AddWideTop(Vector<ushort> op1, Vector<byte> op2);  // svaddwt[_u16]: UADDWT
  public static unsafe Vector<uint> AddWideTop(Vector<uint> op1, Vector<ushort> op2);  // svaddwt[_u32]: UADDWT
  public static unsafe Vector<ulong> AddWideTop(Vector<ulong> op1, Vector<uint> op2);  // svaddwt[_u64]: UADDWT
  public static unsafe Vector<short> AddWideTop(Vector<short> op1, sbyte op2);  // svaddwt[_n_s16]: SADDWT
  public static unsafe Vector<int> AddWideTop(Vector<int> op1, short op2);  // svaddwt[_n_s32]: SADDWT
  public static unsafe Vector<long> AddWideTop(Vector<long> op1, int op2);  // svaddwt[_n_s64]: SADDWT
  public static unsafe Vector<ushort> AddWideTop(Vector<ushort> op1, byte op2);  // svaddwt[_n_u16]: UADDWT
  public static unsafe Vector<uint> AddWideTop(Vector<uint> op1, ushort op2);  // svaddwt[_n_u32]: UADDWT
  public static unsafe Vector<ulong> AddWideTop(Vector<ulong> op1, uint op2);  // svaddwt[_n_u64]: UADDWT

  public static unsafe Vector<uint> ComputeVectorAddressesFor8BitData(Vector<uint> bases, Vector<int> offsets);  // svadrb[_u32base]_[s32]offset: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor8BitData(Vector<uint> bases, Vector<uint> offsets);  // svadrb[_u32base]_[u32]offset: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor8BitData(Vector<ulong> bases, Vector<long> offsets);  // svadrb[_u64base]_[s64]offset: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor8BitData(Vector<ulong> bases, Vector<ulong> offsets);  // svadrb[_u64base]_[u64]offset: ADR

  public static unsafe Vector<uint> ComputeVectorAddressesFor64BitData(Vector<uint> bases, Vector<int> indices);  // svadrd[_u32base]_[s32]index: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor64BitData(Vector<uint> bases, Vector<uint> indices);  // svadrd[_u32base]_[u32]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor64BitData(Vector<ulong> bases, Vector<long> indices);  // svadrd[_u64base]_[s64]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor64BitData(Vector<ulong> bases, Vector<ulong> indices);  // svadrd[_u64base]_[u64]index: ADR

  public static unsafe Vector<uint> ComputeVectorAddressesFor16BitData(Vector<uint> bases, Vector<int> indices);  // svadrh[_u32base]_[s32]index: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor16BitData(Vector<uint> bases, Vector<uint> indices);  // svadrh[_u32base]_[u32]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor16BitData(Vector<ulong> bases, Vector<long> indices);  // svadrh[_u64base]_[s64]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor16BitData(Vector<ulong> bases, Vector<ulong> indices);  // svadrh[_u64base]_[u64]index: ADR

  public static unsafe Vector<uint> ComputeVectorAddressesFor32BitData(Vector<uint> bases, Vector<int> indices);  // svadrw[_u32base]_[s32]index: ADR
  public static unsafe Vector<uint> ComputeVectorAddressesFor32BitData(Vector<uint> bases, Vector<uint> indices);  // svadrw[_u32base]_[u32]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor32BitData(Vector<ulong> bases, Vector<long> indices);  // svadrw[_u64base]_[s64]index: ADR
  public static unsafe Vector<ulong> ComputeVectorAddressesFor32BitData(Vector<ulong> bases, Vector<ulong> indices);  // svadrw[_u64base]_[u64]index: ADR

  public static unsafe Vector<byte> AesSingleRoundDecryption(Vector<byte> op1, Vector<byte> op2);  // svaesd[_u8]: AESD or AESD

  public static unsafe Vector<byte> AesSingleRoundEncryption(Vector<byte> op1, Vector<byte> op2);  // svaese[_u8]: AESE or AESE

  public static unsafe Vector<byte> AesInverseMixColumns(Vector<byte> op);  // svaesimc[_u8]: AESIMC

  public static unsafe Vector<byte> AesMixColumns(Vector<byte> op);  // svaesmc[_u8]: AESMC

  public static unsafe Vector<sbyte> BitwiseClearAndExclusiveOr(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbcax[_s8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<short> BitwiseClearAndExclusiveOr(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbcax[_s16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<int> BitwiseClearAndExclusiveOr(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbcax[_s32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<long> BitwiseClearAndExclusiveOr(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbcax[_s64]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<byte> BitwiseClearAndExclusiveOr(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbcax[_u8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ushort> BitwiseClearAndExclusiveOr(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbcax[_u16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<uint> BitwiseClearAndExclusiveOr(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbcax[_u32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ulong> BitwiseClearAndExclusiveOr(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbcax[_u64]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<sbyte> BitwiseClearAndExclusiveOr(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbcax[_n_s8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<short> BitwiseClearAndExclusiveOr(Vector<short> op1, Vector<short> op2, short op3);  // svbcax[_n_s16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<int> BitwiseClearAndExclusiveOr(Vector<int> op1, Vector<int> op2, int op3);  // svbcax[_n_s32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<long> BitwiseClearAndExclusiveOr(Vector<long> op1, Vector<long> op2, long op3);  // svbcax[_n_s64]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<byte> BitwiseClearAndExclusiveOr(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbcax[_n_u8]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ushort> BitwiseClearAndExclusiveOr(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbcax[_n_u16]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<uint> BitwiseClearAndExclusiveOr(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbcax[_n_u32]: BCAX or MOVPRFX+BCAX
  public static unsafe Vector<ulong> BitwiseClearAndExclusiveOr(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbcax[_n_u64]: BCAX or MOVPRFX+BCAX

  public static unsafe Vector<byte> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<byte> op1, Vector<byte> op2);  // svbdep[_u8]: BDEP
  public static unsafe Vector<ushort> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ushort> op1, Vector<ushort> op2);  // svbdep[_u16]: BDEP
  public static unsafe Vector<uint> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<uint> op1, Vector<uint> op2);  // svbdep[_u32]: BDEP
  public static unsafe Vector<ulong> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ulong> op1, Vector<ulong> op2);  // svbdep[_u64]: BDEP
  public static unsafe Vector<byte> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<byte> op1, byte op2);  // svbdep[_n_u8]: BDEP
  public static unsafe Vector<ushort> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ushort> op1, ushort op2);  // svbdep[_n_u16]: BDEP
  public static unsafe Vector<uint> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<uint> op1, uint op2);  // svbdep[_n_u32]: BDEP
  public static unsafe Vector<ulong> ScatterLowerBitsIntoPositionsSelectedByBitmask(Vector<ulong> op1, ulong op2);  // svbdep[_n_u64]: BDEP

  public static unsafe Vector<byte> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<byte> op1, Vector<byte> op2);  // svbext[_u8]: BEXT
  public static unsafe Vector<ushort> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ushort> op1, Vector<ushort> op2);  // svbext[_u16]: BEXT
  public static unsafe Vector<uint> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<uint> op1, Vector<uint> op2);  // svbext[_u32]: BEXT
  public static unsafe Vector<ulong> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ulong> op1, Vector<ulong> op2);  // svbext[_u64]: BEXT
  public static unsafe Vector<byte> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<byte> op1, byte op2);  // svbext[_n_u8]: BEXT
  public static unsafe Vector<ushort> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ushort> op1, ushort op2);  // svbext[_n_u16]: BEXT
  public static unsafe Vector<uint> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<uint> op1, uint op2);  // svbext[_n_u32]: BEXT
  public static unsafe Vector<ulong> GatherLowerBitsFromPositionsSelectedByBitmask(Vector<ulong> op1, ulong op2);  // svbext[_n_u64]: BEXT

  public static unsafe Vector<byte> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<byte> op1, Vector<byte> op2);  // svbgrp[_u8]: BGRP
  public static unsafe Vector<ushort> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ushort> op1, Vector<ushort> op2);  // svbgrp[_u16]: BGRP
  public static unsafe Vector<uint> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<uint> op1, Vector<uint> op2);  // svbgrp[_u32]: BGRP
  public static unsafe Vector<ulong> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ulong> op1, Vector<ulong> op2);  // svbgrp[_u64]: BGRP
  public static unsafe Vector<byte> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<byte> op1, byte op2);  // svbgrp[_n_u8]: BGRP
  public static unsafe Vector<ushort> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ushort> op1, ushort op2);  // svbgrp[_n_u16]: BGRP
  public static unsafe Vector<uint> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<uint> op1, uint op2);  // svbgrp[_n_u32]: BGRP
  public static unsafe Vector<ulong> GroupBitsToRightOrLeftAsSelectedByBitmask(Vector<ulong> op1, ulong op2);  // svbgrp[_n_u64]: BGRP

  public static unsafe Vector<sbyte> BitwiseSelect(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbsl[_s8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<short> BitwiseSelect(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbsl[_s16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<int> BitwiseSelect(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbsl[_s32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<long> BitwiseSelect(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbsl[_s64]: BSL or MOVPRFX+BSL
  public static unsafe Vector<byte> BitwiseSelect(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbsl[_u8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ushort> BitwiseSelect(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbsl[_u16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<uint> BitwiseSelect(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbsl[_u32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ulong> BitwiseSelect(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbsl[_u64]: BSL or MOVPRFX+BSL
  public static unsafe Vector<sbyte> BitwiseSelect(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbsl[_n_s8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<short> BitwiseSelect(Vector<short> op1, Vector<short> op2, short op3);  // svbsl[_n_s16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<int> BitwiseSelect(Vector<int> op1, Vector<int> op2, int op3);  // svbsl[_n_s32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<long> BitwiseSelect(Vector<long> op1, Vector<long> op2, long op3);  // svbsl[_n_s64]: BSL or MOVPRFX+BSL
  public static unsafe Vector<byte> BitwiseSelect(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbsl[_n_u8]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ushort> BitwiseSelect(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbsl[_n_u16]: BSL or MOVPRFX+BSL
  public static unsafe Vector<uint> BitwiseSelect(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbsl[_n_u32]: BSL or MOVPRFX+BSL
  public static unsafe Vector<ulong> BitwiseSelect(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbsl[_n_u64]: BSL or MOVPRFX+BSL

  public static unsafe Vector<sbyte> BitwiseSelectWithFirstInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbsl1n[_s8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<short> BitwiseSelectWithFirstInputInverted(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbsl1n[_s16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<int> BitwiseSelectWithFirstInputInverted(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbsl1n[_s32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<long> BitwiseSelectWithFirstInputInverted(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbsl1n[_s64]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<byte> BitwiseSelectWithFirstInputInverted(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbsl1n[_u8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ushort> BitwiseSelectWithFirstInputInverted(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbsl1n[_u16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<uint> BitwiseSelectWithFirstInputInverted(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbsl1n[_u32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ulong> BitwiseSelectWithFirstInputInverted(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbsl1n[_u64]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<sbyte> BitwiseSelectWithFirstInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbsl1n[_n_s8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<short> BitwiseSelectWithFirstInputInverted(Vector<short> op1, Vector<short> op2, short op3);  // svbsl1n[_n_s16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<int> BitwiseSelectWithFirstInputInverted(Vector<int> op1, Vector<int> op2, int op3);  // svbsl1n[_n_s32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<long> BitwiseSelectWithFirstInputInverted(Vector<long> op1, Vector<long> op2, long op3);  // svbsl1n[_n_s64]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<byte> BitwiseSelectWithFirstInputInverted(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbsl1n[_n_u8]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ushort> BitwiseSelectWithFirstInputInverted(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbsl1n[_n_u16]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<uint> BitwiseSelectWithFirstInputInverted(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbsl1n[_n_u32]: BSL1N or MOVPRFX+BSL1N
  public static unsafe Vector<ulong> BitwiseSelectWithFirstInputInverted(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbsl1n[_n_u64]: BSL1N or MOVPRFX+BSL1N

  public static unsafe Vector<sbyte> BitwiseSelectWithSecondInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svbsl2n[_s8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<short> BitwiseSelectWithSecondInputInverted(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // svbsl2n[_s16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<int> BitwiseSelectWithSecondInputInverted(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // svbsl2n[_s32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<long> BitwiseSelectWithSecondInputInverted(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // svbsl2n[_s64]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<byte> BitwiseSelectWithSecondInputInverted(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // svbsl2n[_u8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ushort> BitwiseSelectWithSecondInputInverted(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // svbsl2n[_u16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<uint> BitwiseSelectWithSecondInputInverted(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // svbsl2n[_u32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ulong> BitwiseSelectWithSecondInputInverted(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // svbsl2n[_u64]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<sbyte> BitwiseSelectWithSecondInputInverted(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // svbsl2n[_n_s8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<short> BitwiseSelectWithSecondInputInverted(Vector<short> op1, Vector<short> op2, short op3);  // svbsl2n[_n_s16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<int> BitwiseSelectWithSecondInputInverted(Vector<int> op1, Vector<int> op2, int op3);  // svbsl2n[_n_s32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<long> BitwiseSelectWithSecondInputInverted(Vector<long> op1, Vector<long> op2, long op3);  // svbsl2n[_n_s64]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<byte> BitwiseSelectWithSecondInputInverted(Vector<byte> op1, Vector<byte> op2, byte op3);  // svbsl2n[_n_u8]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ushort> BitwiseSelectWithSecondInputInverted(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // svbsl2n[_n_u16]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<uint> BitwiseSelectWithSecondInputInverted(Vector<uint> op1, Vector<uint> op2, uint op3);  // svbsl2n[_n_u32]: BSL2N or MOVPRFX+BSL2N
  public static unsafe Vector<ulong> BitwiseSelectWithSecondInputInverted(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // svbsl2n[_n_u64]: BSL2N or MOVPRFX+BSL2N

  public static unsafe Vector<sbyte> ComplexAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, ulong imm_rotation);  // svcadd[_s8]: CADD or MOVPRFX+CADD
  public static unsafe Vector<short> ComplexAddWithRotate(Vector<short> op1, Vector<short> op2, ulong imm_rotation);  // svcadd[_s16]: CADD or MOVPRFX+CADD
  public static unsafe Vector<int> ComplexAddWithRotate(Vector<int> op1, Vector<int> op2, ulong imm_rotation);  // svcadd[_s32]: CADD or MOVPRFX+CADD
  public static unsafe Vector<long> ComplexAddWithRotate(Vector<long> op1, Vector<long> op2, ulong imm_rotation);  // svcadd[_s64]: CADD or MOVPRFX+CADD
  public static unsafe Vector<byte> ComplexAddWithRotate(Vector<byte> op1, Vector<byte> op2, ulong imm_rotation);  // svcadd[_u8]: CADD or MOVPRFX+CADD
  public static unsafe Vector<ushort> ComplexAddWithRotate(Vector<ushort> op1, Vector<ushort> op2, ulong imm_rotation);  // svcadd[_u16]: CADD or MOVPRFX+CADD
  public static unsafe Vector<uint> ComplexAddWithRotate(Vector<uint> op1, Vector<uint> op2, ulong imm_rotation);  // svcadd[_u32]: CADD or MOVPRFX+CADD
  public static unsafe Vector<ulong> ComplexAddWithRotate(Vector<ulong> op1, Vector<ulong> op2, ulong imm_rotation);  // svcadd[_u64]: CADD or MOVPRFX+CADD

  public static unsafe Vector<int> ComplexDotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_rotation);  // svcdot[_s32]: CDOT or MOVPRFX+CDOT
  public static unsafe Vector<long> ComplexDotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3, ulong imm_rotation);  // svcdot[_s64]: CDOT or MOVPRFX+CDOT
  public static unsafe Vector<int> ComplexDotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_index, ulong imm_rotation);  // svcdot_lane[_s32]: CDOT or MOVPRFX+CDOT
  public static unsafe Vector<long> ComplexDotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3, ulong imm_index, ulong imm_rotation);  // svcdot_lane[_s64]: CDOT or MOVPRFX+CDOT

  public static unsafe Vector<sbyte> ComplexMultiplyAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_rotation);  // svcmla[_s8]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<short> ComplexMultiplyAddWithRotate(Vector<short> op1, Vector<short> op2, Vector<short> op3, ulong imm_rotation);  // svcmla[_s16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<int> ComplexMultiplyAddWithRotate(Vector<int> op1, Vector<int> op2, Vector<int> op3, ulong imm_rotation);  // svcmla[_s32]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<long> ComplexMultiplyAddWithRotate(Vector<long> op1, Vector<long> op2, Vector<long> op3, ulong imm_rotation);  // svcmla[_s64]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<byte> ComplexMultiplyAddWithRotate(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3, ulong imm_rotation);  // svcmla[_u8]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<ushort> ComplexMultiplyAddWithRotate(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3, ulong imm_rotation);  // svcmla[_u16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<uint> ComplexMultiplyAddWithRotate(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3, ulong imm_rotation);  // svcmla[_u32]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<ulong> ComplexMultiplyAddWithRotate(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3, ulong imm_rotation);  // svcmla[_u64]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<float> ComplexMultiplyAddWithRotate(Vector<float> op1, Vector<float> op2, Vector<float> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_f32]: FCMLA or MOVPRFX+FCMLA
  public static unsafe Vector<short> ComplexMultiplyAddWithRotate(Vector<short> op1, Vector<short> op2, Vector<short> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_s16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<int> ComplexMultiplyAddWithRotate(Vector<int> op1, Vector<int> op2, Vector<int> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_s32]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<ushort> ComplexMultiplyAddWithRotate(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_u16]: CMLA or MOVPRFX+CMLA
  public static unsafe Vector<uint> ComplexMultiplyAddWithRotate(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3, ulong imm_index, ulong imm_rotation);  // svcmla_lane[_u32]: CMLA or MOVPRFX+CMLA

  public static unsafe ulong CountTheNumberOf8BitElementsInAVector();  // svcntb: CNTB
  public static unsafe ulong CountTheNumberOf8BitElementsInAVector(enum svpattern pattern);  // svcntb_pat: CNTB

  public static unsafe ulong CountTheNumberOf64BitElementsInAVector();  // svcntd: CNTD
  public static unsafe ulong CountTheNumberOf64BitElementsInAVector(enum svpattern pattern);  // svcntd_pat: CNTD

  public static unsafe ulong CountTheNumberOf16BitElementsInAVector();  // svcnth: CNTH
  public static unsafe ulong CountTheNumberOf16BitElementsInAVector(enum svpattern pattern);  // svcnth_pat: CNTH

  public static unsafe ulong CountTheNumberOf32BitElementsInAVector();  // svcntw: CNTW
  public static unsafe ulong CountTheNumberOf32BitElementsInAVector(enum svpattern pattern);  // svcntw_pat: CNTW

  public static unsafe Vector<int> DotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // svdot[_s32]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<long> DotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3);  // svdot[_s64]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<uint> DotProduct(Vector<uint> op1, Vector<byte> op2, Vector<byte> op3);  // svdot[_u32]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<ulong> DotProduct(Vector<ulong> op1, Vector<ushort> op2, Vector<ushort> op3);  // svdot[_u64]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<int> DotProduct(Vector<int> op1, Vector<sbyte> op2, sbyte op3);  // svdot[_n_s32]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<long> DotProduct(Vector<long> op1, Vector<short> op2, short op3);  // svdot[_n_s64]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<uint> DotProduct(Vector<uint> op1, Vector<byte> op2, byte op3);  // svdot[_n_u32]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<ulong> DotProduct(Vector<ulong> op1, Vector<ushort> op2, ushort op3);  // svdot[_n_u64]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<int> DotProduct(Vector<int> op1, Vector<sbyte> op2, Vector<sbyte> op3, ulong imm_index);  // svdot_lane[_s32]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<long> DotProduct(Vector<long> op1, Vector<short> op2, Vector<short> op3, ulong imm_index);  // svdot_lane[_s64]: SDOT or MOVPRFX+SDOT
  public static unsafe Vector<uint> DotProduct(Vector<uint> op1, Vector<byte> op2, Vector<byte> op3, ulong imm_index);  // svdot_lane[_u32]: UDOT or MOVPRFX+UDOT
  public static unsafe Vector<ulong> DotProduct(Vector<ulong> op1, Vector<ushort> op2, Vector<ushort> op3, ulong imm_index);  // svdot_lane[_u64]: UDOT or MOVPRFX+UDOT

  public static unsafe Vector<float> BroadcastAScalarValue(float op);  // svdup[_n]_f32: DUP or FDUP or DUP or DUP
  public static unsafe Vector<double> BroadcastAScalarValue(double op);  // svdup[_n]_f64: DUP or FDUP or DUP or DUP
  public static unsafe Vector<sbyte> BroadcastAScalarValue(sbyte op);  // svdup[_n]_s8: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<short> BroadcastAScalarValue(short op);  // svdup[_n]_s16: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<int> BroadcastAScalarValue(int op);  // svdup[_n]_s32: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<long> BroadcastAScalarValue(long op);  // svdup[_n]_s64: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<byte> BroadcastAScalarValue(byte op);  // svdup[_n]_u8: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<ushort> BroadcastAScalarValue(ushort op);  // svdup[_n]_u16: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<uint> BroadcastAScalarValue(uint op);  // svdup[_n]_u32: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<ulong> BroadcastAScalarValue(ulong op);  // svdup[_n]_u64: DUP or FDUP or DUPM or DUP or DUP
  public static unsafe Vector<float> BroadcastAScalarValue(Vector<float> data, uint index);  // svdup_lane[_f32]: DUP or TBL
  public static unsafe Vector<double> BroadcastAScalarValue(Vector<double> data, ulong index);  // svdup_lane[_f64]: DUP or TBL
  public static unsafe Vector<sbyte> BroadcastAScalarValue(Vector<sbyte> data, byte index);  // svdup_lane[_s8]: DUP or TBL
  public static unsafe Vector<short> BroadcastAScalarValue(Vector<short> data, ushort index);  // svdup_lane[_s16]: DUP or TBL
  public static unsafe Vector<int> BroadcastAScalarValue(Vector<int> data, uint index);  // svdup_lane[_s32]: DUP or TBL
  public static unsafe Vector<long> BroadcastAScalarValue(Vector<long> data, ulong index);  // svdup_lane[_s64]: DUP or TBL
  public static unsafe Vector<byte> BroadcastAScalarValue(Vector<byte> data, byte index);  // svdup_lane[_u8]: DUP or TBL
  public static unsafe Vector<ushort> BroadcastAScalarValue(Vector<ushort> data, ushort index);  // svdup_lane[_u16]: DUP or TBL
  public static unsafe Vector<uint> BroadcastAScalarValue(Vector<uint> data, uint index);  // svdup_lane[_u32]: DUP or TBL
  public static unsafe Vector<ulong> BroadcastAScalarValue(Vector<ulong> data, ulong index);  // svdup_lane[_u64]: DUP or TBL

  public static unsafe Vector<sbyte> BitwiseExclusiveOrOfThreeVectors(Vector<sbyte> op1, Vector<sbyte> op2, Vector<sbyte> op3);  // sveor3[_s8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<short> BitwiseExclusiveOrOfThreeVectors(Vector<short> op1, Vector<short> op2, Vector<short> op3);  // sveor3[_s16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<int> BitwiseExclusiveOrOfThreeVectors(Vector<int> op1, Vector<int> op2, Vector<int> op3);  // sveor3[_s32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<long> BitwiseExclusiveOrOfThreeVectors(Vector<long> op1, Vector<long> op2, Vector<long> op3);  // sveor3[_s64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<byte> BitwiseExclusiveOrOfThreeVectors(Vector<byte> op1, Vector<byte> op2, Vector<byte> op3);  // sveor3[_u8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ushort> BitwiseExclusiveOrOfThreeVectors(Vector<ushort> op1, Vector<ushort> op2, Vector<ushort> op3);  // sveor3[_u16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<uint> BitwiseExclusiveOrOfThreeVectors(Vector<uint> op1, Vector<uint> op2, Vector<uint> op3);  // sveor3[_u32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ulong> BitwiseExclusiveOrOfThreeVectors(Vector<ulong> op1, Vector<ulong> op2, Vector<ulong> op3);  // sveor3[_u64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<sbyte> BitwiseExclusiveOrOfThreeVectors(Vector<sbyte> op1, Vector<sbyte> op2, sbyte op3);  // sveor3[_n_s8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<short> BitwiseExclusiveOrOfThreeVectors(Vector<short> op1, Vector<short> op2, short op3);  // sveor3[_n_s16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<int> BitwiseExclusiveOrOfThreeVectors(Vector<int> op1, Vector<int> op2, int op3);  // sveor3[_n_s32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<long> BitwiseExclusiveOrOfThreeVectors(Vector<long> op1, Vector<long> op2, long op3);  // sveor3[_n_s64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<byte> BitwiseExclusiveOrOfThreeVectors(Vector<byte> op1, Vector<byte> op2, byte op3);  // sveor3[_n_u8]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ushort> BitwiseExclusiveOrOfThreeVectors(Vector<ushort> op1, Vector<ushort> op2, ushort op3);  // sveor3[_n_u16]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<uint> BitwiseExclusiveOrOfThreeVectors(Vector<uint> op1, Vector<uint> op2, uint op3);  // sveor3[_n_u32]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3
  public static unsafe Vector<ulong> BitwiseExclusiveOrOfThreeVectors(Vector<ulong> op1, Vector<ulong> op2, ulong op3);  // sveor3[_n_u64]: EOR3 or EOR3 or EOR3 or MOVPRFX+EOR3

  public static unsafe Vector<sbyte> InterleavingExclusiveOrBottomTop(Vector<sbyte> odd, Vector<sbyte> op1, Vector<sbyte> op2);  // sveorbt[_s8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<short> InterleavingExclusiveOrBottomTop(Vector<short> odd, Vector<short> op1, Vector<short> op2);  // sveorbt[_s16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<int> InterleavingExclusiveOrBottomTop(Vector<int> odd, Vector<int> op1, Vector<int> op2);  // sveorbt[_s32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<long> InterleavingExclusiveOrBottomTop(Vector<long> odd, Vector<long> op1, Vector<long> op2);  // sveorbt[_s64]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<byte> InterleavingExclusiveOrBottomTop(Vector<byte> odd, Vector<byte> op1, Vector<byte> op2);  // sveorbt[_u8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ushort> InterleavingExclusiveOrBottomTop(Vector<ushort> odd, Vector<ushort> op1, Vector<ushort> op2);  // sveorbt[_u16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<uint> InterleavingExclusiveOrBottomTop(Vector<uint> odd, Vector<uint> op1, Vector<uint> op2);  // sveorbt[_u32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ulong> InterleavingExclusiveOrBottomTop(Vector<ulong> odd, Vector<ulong> op1, Vector<ulong> op2);  // sveorbt[_u64]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<sbyte> InterleavingExclusiveOrBottomTop(Vector<sbyte> odd, Vector<sbyte> op1, sbyte op2);  // sveorbt[_n_s8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<short> InterleavingExclusiveOrBottomTop(Vector<short> odd, Vector<short> op1, short op2);  // sveorbt[_n_s16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<int> InterleavingExclusiveOrBottomTop(Vector<int> odd, Vector<int> op1, int op2);  // sveorbt[_n_s32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<long> InterleavingExclusiveOrBottomTop(Vector<long> odd, Vector<long> op1, long op2);  // sveorbt[_n_s64]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<byte> InterleavingExclusiveOrBottomTop(Vector<byte> odd, Vector<byte> op1, byte op2);  // sveorbt[_n_u8]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ushort> InterleavingExclusiveOrBottomTop(Vector<ushort> odd, Vector<ushort> op1, ushort op2);  // sveorbt[_n_u16]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<uint> InterleavingExclusiveOrBottomTop(Vector<uint> odd, Vector<uint> op1, uint op2);  // sveorbt[_n_u32]: EORBT or MOVPRFX+EORBT
  public static unsafe Vector<ulong> InterleavingExclusiveOrBottomTop(Vector<ulong> odd, Vector<ulong> op1, ulong op2);  // sveorbt[_n_u64]: EORBT or MOVPRFX+EORBT

  public static unsafe Vector<sbyte> InterleavingExclusiveOrTopBottom(Vector<sbyte> even, Vector<sbyte> op1, Vector<sbyte> op2);  // sveortb[_s8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<short> InterleavingExclusiveOrTopBottom(Vector<short> even, Vector<short> op1, Vector<short> op2);  // sveortb[_s16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<int> InterleavingExclusiveOrTopBottom(Vector<int> even, Vector<int> op1, Vector<int> op2);  // sveortb[_s32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<long> InterleavingExclusiveOrTopBottom(Vector<long> even, Vector<long> op1, Vector<long> op2);  // sveortb[_s64]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<byte> InterleavingExclusiveOrTopBottom(Vector<byte> even, Vector<byte> op1, Vector<byte> op2);  // sveortb[_u8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ushort> InterleavingExclusiveOrTopBottom(Vector<ushort> even, Vector<ushort> op1, Vector<ushort> op2);  // sveortb[_u16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<uint> InterleavingExclusiveOrTopBottom(Vector<uint> even, Vector<uint> op1, Vector<uint> op2);  // sveortb[_u32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ulong> InterleavingExclusiveOrTopBottom(Vector<ulong> even, Vector<ulong> op1, Vector<ulong> op2);  // sveortb[_u64]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<sbyte> InterleavingExclusiveOrTopBottom(Vector<sbyte> even, Vector<sbyte> op1, sbyte op2);  // sveortb[_n_s8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<short> InterleavingExclusiveOrTopBottom(Vector<short> even, Vector<short> op1, short op2);  // sveortb[_n_s16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<int> InterleavingExclusiveOrTopBottom(Vector<int> even, Vector<int> op1, int op2);  // sveortb[_n_s32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<long> InterleavingExclusiveOrTopBottom(Vector<long> even, Vector<long> op1, long op2);  // sveortb[_n_s64]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<byte> InterleavingExclusiveOrTopBottom(Vector<byte> even, Vector<byte> op1, byte op2);  // sveortb[_n_u8]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ushort> InterleavingExclusiveOrTopBottom(Vector<ushort> even, Vector<ushort> op1, ushort op2);  // sveortb[_n_u16]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<uint> InterleavingExclusiveOrTopBottom(Vector<uint> even, Vector<uint> op1, uint op2);  // sveortb[_n_u32]: EORTB or MOVPRFX+EORTB
  public static unsafe Vector<ulong> InterleavingExclusiveOrTopBottom(Vector<ulong> even, Vector<ulong> op1, ulong op2);  // sveortb[_n_u64]: EORTB or MOVPRFX+EORTB

  public static unsafe Vector<float> FloatingPointExponentialAccelerator(Vector<uint> op);  // svexpa[_f32]: FEXPA
  public static unsafe Vector<double> FloatingPointExponentialAccelerator(Vector<ulong> op);  // svexpa[_f64]: FEXPA

  public static unsafe Vector<float> ExtractVectorFromPairOfVectors(Vector<float> op1, Vector<float> op2, ulong imm3);  // svext[_f32]: EXT or MOVPRFX+EXT
  public static unsafe Vector<double> ExtractVectorFromPairOfVectors(Vector<double> op1, Vector<double> op2, ulong imm3);  // svext[_f64]: EXT or MOVPRFX+EXT
  public static unsafe Vector<sbyte> ExtractVectorFromPairOfVectors(Vector<sbyte> op1, Vector<sbyte> op2, ulong imm3);  // svext[_s8]: EXT or MOVPRFX+EXT
  public static unsafe Vector<short> ExtractVectorFromPairOfVectors(Vector<short> op1, Vector<short> op2, ulong imm3);  // svext[_s16]: EXT or MOVPRFX+EXT
  public static unsafe Vector<int> ExtractVectorFromPairOfVectors(Vector<int> op1, Vector<int> op2, ulong imm3);  // svext[_s32]: EXT or MOVPRFX+EXT
  public static unsafe Vector<long> ExtractVectorFromPairOfVectors(Vector<long> op1, Vector<long> op2, ulong imm3);  // svext[_s64]: EXT or MOVPRFX+EXT
  public static unsafe Vector<byte> ExtractVectorFromPairOfVectors(Vector<byte> op1, Vector<byte> op2, ulong imm3);  // svext[_u8]: EXT or MOVPRFX+EXT
  public static unsafe Vector<ushort> ExtractVectorFromPairOfVectors(Vector<ushort> op1, Vector<ushort> op2, ulong imm3);  // svext[_u16]: EXT or MOVPRFX+EXT
  public static unsafe Vector<uint> ExtractVectorFromPairOfVectors(Vector<uint> op1, Vector<uint> op2, ulong imm3);  // svext[_u32]: EXT or MOVPRFX+EXT
  public static unsafe Vector<ulong> ExtractVectorFromPairOfVectors(Vector<ulong> op1, Vector<ulong> op2, ulong imm3);  // svext[_u64]: EXT or MOVPRFX+EXT

  public static unsafe Vector<byte> CountMatchingElementsIn128BitSegments(Vector<sbyte> op1, Vector<sbyte> op2);  // svhistseg[_s8]: HISTSEG
  public static unsafe Vector<byte> CountMatchingElementsIn128BitSegments(Vector<byte> op1, Vector<byte> op2);  // svhistseg[_u8]: HISTSEG

  public static unsafe Vector<sbyte> CreateLinearSeries(sbyte base, sbyte step);  // svindex_s8: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<short> CreateLinearSeries(short base, short step);  // svindex_s16: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<int> CreateLinearSeries(int base, int step);  // svindex_s32: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<long> CreateLinearSeries(long base, long step);  // svindex_s64: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<byte> CreateLinearSeries(byte base, byte step);  // svindex_u8: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<ushort> CreateLinearSeries(ushort base, ushort step);  // svindex_u16: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<uint> CreateLinearSeries(uint base, uint step);  // svindex_u32: INDEX or INDEX or INDEX or INDEX
  public static unsafe Vector<ulong> CreateLinearSeries(ulong base, ulong step);  // svindex_u64: INDEX or INDEX or INDEX or INDEX

  public static unsafe Vector<float> InsertScalarIntoShiftedVector(Vector<float> op1, float op2);  // svinsr[_n_f32]: INSR or INSR
  public static unsafe Vector<double> InsertScalarIntoShiftedVector(Vector<double> op1, double op2);  // svinsr[_n_f64]: INSR or INSR
  public static unsafe Vector<sbyte> InsertScalarIntoShiftedVector(Vector<sbyte> op1, sbyte op2);  // svinsr[_n_s8]: INSR or INSR
  public static unsafe Vector<short> InsertScalarIntoShiftedVector(Vector<short> op1, short op2);  // svinsr[_n_s16]: INSR or INSR
  public static unsafe Vector<int> InsertScalarIntoShiftedVector(Vector<int> op1, int op2);  // svinsr[_n_s32]: INSR or INSR
  public static unsafe Vector<long> InsertScalarIntoShiftedVector(Vector<long> op1, long op2);  // svinsr[_n_s64]: INSR or INSR
  public static unsafe Vector<byte> InsertScalarIntoShiftedVector(Vector<byte> op1, byte op2);  // svinsr[_n_u8]: INSR or INSR
  public static unsafe Vector<ushort> InsertScalarIntoShiftedVector(Vector<ushort> op1, ushort op2);  // svinsr[_n_u16]: INSR or INSR
  public static unsafe Vector<uint> InsertScalarIntoShiftedVector(Vector<uint> op1, uint op2);  // svinsr[_n_u32]: INSR or INSR
  public static unsafe Vector<ulong> InsertScalarIntoShiftedVector(Vector<ulong> op1, ulong op2);  // svinsr[_n_u64]: INSR or INSR

  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<float> op);  // svlen[_f32]: CNTW
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<double> op);  // svlen[_f64]: CNTD
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<sbyte> op);  // svlen[_s8]: CNTB
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<short> op);  // svlen[_s16]: CNTH
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<int> op);  // svlen[_s32]: CNTW
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<long> op);  // svlen[_s64]: CNTD
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<byte> op);  // svlen[_u8]: CNTB
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<ushort> op);  // svlen[_u16]: CNTH
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<uint> op);  // svlen[_u32]: CNTW
  public static unsafe ulong CountTheNumberOfElementsInAFullVector(Vector<ulong> op);  // svlen[_u64]: CNTD


### API Usage

```csharp

var x = ArmSVE.ReverseAllElements(y);

Vector<uint> a = ArmSVE.ComplexAddWithRotate(b, c, 270);

vector<T> myfunc (Vector<T> i, Vector<T> j, Vector<T> k) {
  return ArmSVE.AbsoluteDifferenceAndAccumulate(i,j,k);
}

Alternative Designs

Some of the methods take in a rotation. eg:

  public static unsafe Vector<sbyte> ComplexAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, ulong imm_rotation);  // svcadd[_s8]: CADD or MOVPRFX+CADD

This rotation must be 90 or 270. We may want to improve it to:

  public static unsafe Vector<sbyte> ComplexAddWithRotate90(Vector<sbyte> op1, Vector<sbyte> op2);  // svcadd[_s8]: CADD or MOVPRFX+CADD
  public static unsafe Vector<sbyte> ComplexAddWithRotate270(Vector<sbyte> op1, Vector<sbyte> op2);  // svcadd[_s8]: CADD or MOVPRFX+CADD

or

  public static unsafe Vector<sbyte> ComplexAddWithRotate(Vector<sbyte> op1, Vector<sbyte> op2, bool rotate270);  // svcadd[_s8]: CADD or MOVPRFX+CADD

Risks

The C intrinsics are well specified. Starting from this existing list will help to reduce errors. However, I expect the C# API to deviate during design review.

It is possible that there are methods for instructions that are not yet in existing hardware. We should prune those from the list as they are found.

Author: a74nh
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Oct 13, 2023

@a74nh a74nh changed the title [API Proposal]: ARM64 SVE/SVE2 : Unpredicated instructions (part 1 of 3) [API Proposal]: ARM64 SVE/SVE2 : Unpredicated instructions (part 1 of 4) Oct 13, 2023
@stephentoub
Copy link
Member

Remove all methods that use float16 or bfloat16. C# does not yet support 16bit float.

System.Half isn't applicable?

@a74nh
Copy link
Contributor Author

a74nh commented Oct 13, 2023

System.Half isn't applicable?

Yes, that should be usable. If no one objects before Monday, I'll add the 39 methods with float16 to the list.

AIUI, there is no bfloat16 in c#, so I'll keep all 69 of those out of the list.

@kunalspathak
Copy link
Member

Remove all methods without an underlying SVE instruction. These will be covered in a future API suggestion.

What are the examples of these?

@kunalspathak
Copy link
Member

It is not required, but since there are so many APIs, I was thinking if we can arrange them in alphabetical order, so it will be easier to search around for the issues in which they are proposed. A step further would be to organize them in logical groups like arithmatic, Data flow, load/store, etc.

@kunalspathak
Copy link
Member

Remove all methods without an underlying SVE instruction. These will be covered in a future API suggestion.

What are the examples of these?

never mind. Just saw #93464

@tannergooding
Copy link
Member

tannergooding commented Oct 13, 2023

public class ArmSVE

This should just be called Sve and it shouldn't be generic. I believe it requires AdvSimd to also exist, so it should likely inherit from the class as well, for consistency with how we model the hierarchy. The signature should then be public abstract class Sve : AdvSimd

This API covers all the SVE/SVE2 instructions that don't require predication.

The APIs you've covered here are the ones in SVE and any that are in SVE2 would be in a separate class named Sve2 which inherits from Sve, correct?

AbsoluteDifferenceAndAccumulate

There's various names here that don't match the existing conventions. For example, the above is simply AbsoluteDifferenceAdd in AdvSimd (the UABA and SABA instructions).

There are also other terminology differences that may cause confusion and which we avoid. For example, we use Widen or Widening rather than Long (and Narrow/Narrowing for the inverse) and we use Lower rather than Bottom (and Upper for the inverse).

AbsoluteDifferenceAndAccumulateLongBottom is therefore AbsoluteDifferenceWideningLower in AdvSimd. AddNarrowHighPartTop is then AddHighNarowingUpper

We do try to keep some platform specific terminology where relevant and so DUP is DuplicatedSelectedScalarToVector128, as an example. Where-as the equivalent is called Broadcast on x64.

There are then also some APIs where it may not be worth exposing a platform specific alternative, namely because a cross platform API is already guaranteed to expose exactly the prescribed behavior. For example, CountTheNumberOfElementsInAFullVector could probably just be Vector<T>.Count, which is significantly simpler and more readable.

Finally, we want to use more descriptive names than op1/op2, where possible. We often use left, right, value, mask, and control. Where disambiguation is required, we sometimes explicitly clarify the additional operator such as left, right, addend or left, right, multiplier.

@a74nh
Copy link
Contributor Author

a74nh commented Oct 16, 2023

public class ArmSVE

This should just be called Sve and it shouldn't be generic. I believe it requires AdvSimd to also exist, so it should likely inherit from the class as well, for consistency with how we model the hierarchy. The signature should then be public abstract class Sve : AdvSimd

This API covers all the SVE/SVE2 instructions that don't require predication.

The APIs you've covered here are the ones in SVE and any that are in SVE2 would be in a separate class named Sve2 which inherits from Sve, correct?

This has SVE and SVE2. I'll split it up. There are also features such as FEAT_I8MM, FEAT_F64MM. I'll need to split all of these out too.

AbsoluteDifferenceAndAccumulate

There's various names here that don't match the existing conventions. For example, the above is simply AbsoluteDifferenceAdd in AdvSimd (the UABA and SABA instructions).

I'll fix these up (and the others) with some pattern matching. If possible I want to keep all of this generated from scripts to avoid mistakes.

It is not required, but since there are so many APIs, I was thinking if we can arrange them in alphabetical order, so it will be easier to search around for the issues in which they are proposed. A step further would be to organize them in logical groups like arithmatic, Data flow, load/store, etc.

Already in alphabetical. Once it's split into features, I'll see if it makes sense to group.

When updating the list is it ok to update the post or should I always be updating into a new comment?

For the next update, it's going to be more than 4 parts (due to the feature splitting), so I guess it'll all be in new issues anyway. And this one would be closed.

@tannergooding
Copy link
Member

When updating the list is it ok to update the post or should I always be updating into a new comment?

Updating the original post is preferred. API review likes being able to reference the top post as the most up to date to avoid potential confusion

@kunalspathak
Copy link
Member

For the next update, it's going to be more than 4 parts (due to the feature splitting), so I guess it'll all be in new issues anyway. And this one would be closed.

Sounds reasonable to me.

@a74nh
Copy link
Contributor Author

a74nh commented Oct 17, 2023

These are all the SVE features currently covered by the C ACLE:

FEAT_SVE
FEAT_SVE2
FEAT_BF16
FEAT_F32MM
FEAT_F64MM
FEAT_FP16
FEAT_I8MM
FEAT_SVE_AES
FEAT_SVE_BitPerm
FEAT_SVE_SHA3
FEAT_SVE_SM4

Each group will need going into it's own section in the C# API.

@a74nh
Copy link
Contributor Author

a74nh commented Oct 17, 2023

Replaced with #93614

@a74nh a74nh closed this as completed Oct 17, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 17, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.Intrinsics
Projects
None yet
Development

No branches or pull requests

4 participants