[Inference] Refactor modeling attention layer by abstracting attention backends #5771

char-1ee · 2024-06-03T05:52:58Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Refactor inference modeling files:
Current modeling under colossalai/inference have nested if-else conditional checks on choosing cuda/triton kernels for Attention layer computations. This PR abstracts the positional encoding and KV Cache operations into PreAttentionBackend, flash attention and flash decoding ops into AttentionBackend. Now inference modeling initializes attention backend according to configs that users give.
Fix typo and naming, refactor code organizations.

Tests:

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

colossalai/inference/config.py

colossalai/inference/core/engine.py

colossalai/inference/modeling/backends/attention_backend.py

colossalai/inference/modeling/backends/pre_attention_backend.py

colossalai/inference/modeling/backends/attention_backend.py

colossalai/inference/modeling/backends/pre_attention_backend.py

colossalai/inference/modeling/models/nopadding_llama.py

Signed-off-by: char-1ee <xingjianli59@gmail.com>

colossalai/_C/.nfs0000000013155a3b0000021b

Signed-off-by: char-1ee <xingjianli59@gmail.com>

char-1ee requested a review from a team as a code owner June 3, 2024 05:52

char-1ee added the colossal-inference label Jun 3, 2024

yuanheng-zhao reviewed Jun 3, 2024

View reviewed changes

char-1ee mentioned this pull request Jun 3, 2024

[PROPOSAL]: Refactor inference engine by selecting backend during init of modules #5773

Closed

1 task

char-1ee added 3 commits June 7, 2024 08:33

Refactor modeling by adding attention backend

04386d9

Signed-off-by: char-1ee <xingjianli59@gmail.com>

Fix tests and naming

eec77e5

Signed-off-by: char-1ee <xingjianli59@gmail.com>

Pass inference model shard configs for module init

5f398fc

Signed-off-by: char-1ee <xingjianli59@gmail.com>

char-1ee force-pushed the refactor/modeling branch from 1ed7f7f to 5f398fc Compare June 7, 2024 08:34

char-1ee commented Jun 7, 2024

View reviewed changes

colossalai/_C/.nfs0000000013155a3b0000021b Outdated Show resolved Hide resolved

char-1ee added 2 commits June 7, 2024 09:09

Clean up

ceba662

Signed-off-by: char-1ee <xingjianli59@gmail.com>

Remove flash attention backend

f5981e8

Signed-off-by: char-1ee <xingjianli59@gmail.com>

char-1ee changed the title ~~[Inference] Refactor inference modeling~~ [Inference] Refactor modeling attention layer by abstracting attention backends Jun 8, 2024

Fix test import

b303976

Signed-off-by: char-1ee <xingjianli59@gmail.com>

yuanheng-zhao approved these changes Jun 10, 2024

View reviewed changes

char-1ee merged commit 77a219a into hpcaitech:main Jun 10, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Refactor modeling attention layer by abstracting attention backends #5771

[Inference] Refactor modeling attention layer by abstracting attention backends #5771

char-1ee commented Jun 3, 2024

[Inference] Refactor modeling attention layer by abstracting attention backends #5771

[Inference] Refactor modeling attention layer by abstracting attention backends #5771

Conversation

char-1ee commented Jun 3, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?