PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR with only rewriter changes) #7593

copybara-service · 2023-12-07T10:53:38Z

PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR with only rewriter changes)

Imported from GitHub PR #6872

This is the 3nd PR of splitting #5910 with only rewriter changes
1st PR #6293 merged.
2nd PR #6657 merged.

Add pattern match for causal mask
Add paxml dropout pattern match
Add flash attention fusion
Add flash attention support cuDNN version guard
Add tests for flash attention rewriter/e2e
Copybara import of the project:

--
490d0a3 by cjkkkk ske@nvidia.com:

init flash attention rewriter

--
90e765f by cjkkkk ske@nvidia.com:

use while body back pointer to find causal mask

--
a3e5905 by cjkkkk ske@nvidia.com:

add gpu backend to fmha e2e tests && address some format issues

--
c82c064 by cjkkkk ske@nvidia.com:

fix rebase error

--
2f30df0 by cjkkkk ske@nvidia.com:

Use GPT3_5B model pre rewriter HLo

--
47aceb1 by cjkkkk ske@nvidia.com:

add flash attention cuDNN version check && restore fwd graph is dbias/mask is not supported

Merging this change closes #6872

FUTURE_COPYBARA_INTEGRATE_REVIEW=#6872 from Cjkkkk:flash_attention_rewriter 47aceb1

…with only rewriter changes) Imported from GitHub PR #6872 This is the 3nd PR of splitting #5910 with only rewriter changes 1st PR #6293 merged. 2nd PR #6657 merged. * Add pattern match for causal mask * Add paxml dropout pattern match * Add flash attention fusion * Add flash attention support cuDNN version guard * Add tests for flash attention rewriter/e2e Copybara import of the project: -- 490d0a3 by cjkkkk <ske@nvidia.com>: init flash attention rewriter -- 90e765f by cjkkkk <ske@nvidia.com>: use while body back pointer to find causal mask -- a3e5905 by cjkkkk <ske@nvidia.com>: add gpu backend to fmha e2e tests && address some format issues -- c82c064 by cjkkkk <ske@nvidia.com>: fix rebase error -- 2f30df0 by cjkkkk <ske@nvidia.com>: Use GPT3_5B model pre rewriter HLo -- 47aceb1 by cjkkkk <ske@nvidia.com>: add flash attention cuDNN version check && restore fwd graph is dbias/mask is not supported Merging this change closes #6872 FUTURE_COPYBARA_INTEGRATE_REVIEW=#6872 from Cjkkkk:flash_attention_rewriter 47aceb1 PiperOrigin-RevId: 588714363

copybara-service bot force-pushed the test_588714363 branch from a46a369 to 633fc7f Compare December 7, 2023 12:49

copybara-service bot force-pushed the test_588714363 branch from 633fc7f to 2e39469 Compare December 15, 2023 14:03

copybara-service bot closed this Apr 10, 2024

copybara-service bot deleted the test_588714363 branch April 10, 2024 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR with only rewriter changes) #7593

PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR with only rewriter changes) #7593

copybara-service bot commented Dec 7, 2023

PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR with only rewriter changes) #7593

PR #6872: [XLA:GPU] add cuDNN flash attention support in XLA (3rd PR with only rewriter changes) #7593

Conversation

copybara-service bot commented Dec 7, 2023