Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather with i32 indexes #329

Open
benjamin-lieser opened this issue Feb 4, 2023 · 11 comments
Open

gather with i32 indexes #329

benjamin-lieser opened this issue Feb 4, 2023 · 11 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@benjamin-lieser
Copy link

As far as I can see we can only use gather with usize indexes.

The x86 intrinsic _mm256_i32gather_ps however takes i32 indexes and a float* base pointer.
Is it possible to add more flexible options to the index type of the gather instructions?

@benjamin-lieser benjamin-lieser added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Feb 4, 2023
@calebzulawski
Copy link
Member

This is related to #323.

I wonder--for forward compatibility with new instruction sets that could have some other arbitrary gather index, should gather accept any T: SimdUsize? Also, does it matter if the type is signed or unsigned (are there instruction sets that don't support negative offsets)?

@programmerjake
Copy link
Member

SimpleV gathers with unsigned offsets are more efficient than signed offsets, because the gather operation can be fused with a few minor operations done on the output, such as saturation, but the fusing can only happen when the offsets are unsigned. If no fusing occurs, signed/unsigned is equally efficient (at least in terms of instruction count, some hardware may be faster for one or the other).

see the discussion of the SEA bit for load/store indexed in https://libre-soc.org/openpower/sv/ldst/

@programmerjake
Copy link
Member

RISC-V V v1.0 doesn't support signed offsets (except by separately converting the offsets to a usize vector first):

If the vector offset elements are narrower than XLEN, they are zero-extended to XLEN before adding to the base effective address.

(XLEN is basically the number of bits in usize)

@programmerjake
Copy link
Member

so, imho, if we support any non-usize indexes, we should just support all of them and LLVM can cast the input to a usize vector if needed.

@workingjubilee
Copy link
Member

It should be noted that it is fairly doubtful to offset a base pointer by a negative value in terms of Rust's system of reasoning about pointers, regardless of whether the ISA allows it, because going from &[T] to *const T and then gathering using that only gives you provenance for positive indices.

@programmerjake
Copy link
Member

well, sometimes you want negative offsets on a raw pointer, I agree that it is very rare, but imho it's still worth supporting in some fashion (could be by saying do-it-yourself by using Simd<*const T, _>::add then call non-indexing gather on that vector of pointers), though we might want a lint on indexing using signed types if we support them in indexing-gather.

@programmerjake
Copy link
Member

otoh, if the semantics of indexing-gather on slices is to select-in some fallback value on out-of-range indexes, using -1 as a guaranteed out-of-range value could be nice to avoid needing multiple selects.

@workingjubilee
Copy link
Member

My remarks were not about what is permissible, as I am aware of that caveat and obviously ptr.offset(i) is legal, as counting down from the end of a slice is fine. The implication was more that it would make sense if trying to use signed integers would be... less ergonomic.

@programmerjake
Copy link
Member

Simd::gather_yes_i_know_i_want_signed_indexes 😛

@calebzulawski
Copy link
Member

As far as I can tell, LLVM's vp.gather intrinsic supports negative offsets. I'm curious how that works on targets that don't support it natively--is there always conversion overhead, even when using unsigned offsets?

I'll implement this after #322 lands, since that touches gather as well.

@programmerjake
Copy link
Member

As far as I can tell, LLVM's vp.gather intrinsic supports negative offsets.

that's not true, llvm.vp.gather and llvm.masked.gather don't support offsets of any sort, their input is a vector of pointers. if you need offsets (negative or positive), you have to add them into the vector of pointers (probably by using getelementptr as appropriate) and supplying the result to the gather op.
what sometimes supports offsets is the target ISA's instructions, e.g. vpgatherdd is equivalent to fusing a getelementptr with a splatted base pointer and with 32-bit offsets with a llvm.vp.gather.

I'm curious how that works on targets that don't support it natively--is there always conversion overhead, even when using unsigned offsets?

if you mean targets that don't support gather at all, it would be equivalent to a sequence of scalar load ops, with getelementptrs as needed to calculate the pointers to be loaded by applying the desired offsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

4 participants