-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to call this feature as: scheduling-defined pdd :)
c28584b
to
16a05d1
Compare
ab199a4
to
f50fa8d
Compare
937afce
to
c921ef9
Compare
# the expected steps, blocking_migration is True. | ||
@property | ||
def blocking_migration(self) -> bool: | ||
return self.output_len >= self.expected_steps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use this to tag the requests that finish the expected steps?
blocking is ambiguous,consider change the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use this to tag the requests that finish the expected steps? blocking is ambiguous,consider change the name.
yes. In our paper blocking migration means one-stage migration
@@ -98,29 +98,36 @@ def from_args(cls, | |||
llumlet = engine_class.remote(instance_id, backend_type, migration_config, *args, **kwargs) | |||
return llumlet | |||
|
|||
def migrate_out(self, dst_instance_name: str) -> List[str]: | |||
def migrate_out(self, dst_instance_name: str, num_requests: int) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, num_request is used like a boolean (num_requests == 1)
And for pdd, I think you need a function named migrate_out_singlestage.
consider refactor this fucntion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, num_request is used like a boolean (num_requests == 1)
And for pdd, I think you need a function named migrate_out_singlestage.
consider refactor this fucntion
Removed the logic to treat num_request as a boolean. We have reused migrate_out_multistage to send blocks in one stage for pdd and dont need additional function. Please check.
3c6166e
to
749a93f
Compare
|
|
|
|
@@ -61,6 +62,8 @@ class EngineManagerArgs: | |||
last_stage_max_blocks: int = None | |||
max_stages: int = None | |||
|
|||
enable_pd_disagg: bool = False | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this file, set the default value to None, set default value in config/default.py.
we want to get default value from only one palce
@@ -47,9 +56,15 @@ def __init__(self, | |||
self.instance_info: Dict[str, InstanceInfo] = None | |||
self.sorted_instance_infos: List[InstanceInfo] = None | |||
|
|||
def pair_migration(self) -> List[Tuple[str, str]]: | |||
def pair_migration(self, pair_migration_type:str) -> List[Tuple[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pair_migration_type:str -> pair_migration_type: str
add a space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and other place
Design for introducing cluster-level prefill-decode disaggregation design to Llumnix. Based on dynamic rescheduling of requests in Llumnix, this design allows Llumnix to manage prefill/decoding instances and the scheduling of requests on these instances. Specifically, this PR designs broader scheduling semantics, enabling the rules for PD disaggregation to be expressed as customized policies within Llumnix.