-
Notifications
You must be signed in to change notification settings - Fork 69
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PR #6657: [XLA:GPU ] add cuDNN flash attention support in XLA (2nd PR…
… with only MLIR lowering and thunk/runtime) Imported from GitHub PR openxla/xla#6657 This is the 2nd PR of splitting openxla/xla#5910 with only MLIR lowering and thunk/runtime 1st PR openxla/xla#6293 merged. * Added MLIR lowering for flash attention. * Added thunk/runner/runtime support for flash attention. Copybara import of the project: -- 6f89a7355b4b46cbb974b39ca60e07ae08079f1a by cjkkkk <ske@nvidia.com>: init mlir lowering and thunk runtime -- f57b8bee2ba1ad361556c32cb9333c4ac4730016 by cjkkkk <ske@nvidia.com>: address some comments Merging this change closes #6657 PiperOrigin-RevId: 580413629
- Loading branch information
Showing
2 changed files
with
20 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters