Skip to content

Commit

Permalink
[misc] Bypass the huggingface bug to solve the mask mismatch problem (h…
Browse files Browse the repository at this point in the history
  • Loading branch information
Hz188 authored Aug 15, 2024
1 parent 4dd0399 commit 887d2d5
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions colossalai/shardformer/modeling/deepseek.py
Original file line number Diff line number Diff line change
Expand Up @@ -666,6 +666,9 @@ def forward(
if inputs_embeds is None:
inputs_embeds = self.embed_tokens(input_ids)

# TODO: upgrade transformers to 4.44.0 to fix the bug, remove the hard code.
self._use_flash_attention_2 = shard_config.enable_flash_attention
self._use_sdpa = False if shard_config.enable_flash_attention else self._use_sdpa
if self._use_flash_attention_2:
# 2d mask is passed through the layers
attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None
Expand Down

0 comments on commit 887d2d5

Please sign in to comment.