[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Xinyi-ECNU · 2024-08-23T08:54:44Z

Design for introducing cluster-level prefill-decode disaggregation design to Llumnix. Based on dynamic rescheduling of requests in Llumnix, this design allows Llumnix to manage prefill/decoding instances and the scheduling of requests on these instances. Specifically, this PR designs broader scheduling semantics, enabling the rules for PD disaggregation to be expressed as customized policies within Llumnix.

llumnix/arg_utils.py

llumnix/backends/backend_interface.py

llumnix/backends/vllm/llm_engine.py

llumnix/backends/vllm/scheduler.py

llumnix/backends/vllm/llm_engine.py

llumnix/backends/vllm/scheduler.py

llumnix/global_scheduler/global_scheduler.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/llumlet/local_migration_scheduler.py

zhypku

I'd like to call this feature as: scheduling-defined pdd :)

llumnix/arg_utils.py

llumnix/backends/backend_interface.py

llumnix/backends/vllm/llm_engine.py

llumnix/global_scheduler/global_scheduler.py

llumnix/backends/vllm/scheduler.py

llumnix/config.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/llm_engine_manager.py

llumnix/global_scheduler/migration_scheduler.py

CLAassistant · 2024-08-28T03:20:16Z

All committers have signed the CLA.

llumnix/backends/backend_interface.py

llumnix/config.py

llumnix/global_scheduler/global_scheduler.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/llm_engine_manager.py

llumnix/llumlet/llumlet.py

llumnix/llumlet/migration_coordinator.py

llumnix/backends/vllm/scheduler.py

llumnix/llumlet/request.py

llumnix/llumlet/llumlet.py

llumnix/llumlet/local_migration_scheduler.py

llumnix/global_scheduler/migration_scheduler.py

llumnix/backends/backend_interface.py

llumnix/llumlet/request.py

KuilongCui · 2024-09-11T06:52:17Z

llumnix/llumlet/request.py

+    # the expected steps, blocking_migration is True.
+    @property
+    def blocking_migration(self) -> bool:
+        return self.output_len >= self.expected_steps


Do you use this to tag the requests that finish the expected steps?
blocking is ambiguous，consider change the name.

Do you use this to tag the requests that finish the expected steps? blocking is ambiguous，consider change the name.

yes. In our paper blocking migration means one-stage migration

KuilongCui · 2024-09-11T07:10:15Z

llumnix/llumlet/llumlet.py

@@ -98,29 +98,36 @@ def from_args(cls,
        llumlet = engine_class.remote(instance_id, backend_type, migration_config, *args, **kwargs)
        return llumlet

-    def migrate_out(self, dst_instance_name: str) -> List[str]:
+    def migrate_out(self, dst_instance_name: str, num_requests: int) -> List[str]:


In this function, num_request is used like a boolean (num_requests == 1)

And for pdd, I think you need a function named migrate_out_singlestage.

consider refactor this fucntion

In this function, num_request is used like a boolean (num_requests == 1)

And for pdd, I think you need a function named migrate_out_singlestage.

consider refactor this fucntion

Removed the logic to treat num_request as a boolean. We have reused migrate_out_multistage to send blocks in one stage for pdd and dont need additional function. Please check.

llumnix/global_scheduler/migration_scheduler.py

github-actions · 2024-09-24T13:31:39Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	224.00 MB	232.00 MB	280.00 MB	288.00 MB	312.00 MB	344.00 MB	368.00 MB	416.00 MB	424.00 MB	432.00 MB	440.00 MB	496.00 MB	720.00 MB	912.00 MB
rpc_speed(GB/s)	1.05	1.53	1.79	1.95	2.04	2.18	2.18	2.22	2.24	2.34	2.34	2.41	2.43	2.40	2.49	2.51	2.62	2.34	2.34	2.59	2.52	2.50	2.52	2.46	2.64	2.78	2.18	2.49	2.80	2.59	2.76	2.92	3.18	3.05	3.13	3.17	2.93	3.45	3.32	3.06

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	192.00 MB	200.00 MB	240.00 MB	280.00 MB	312.00 MB	384.00 MB	416.00 MB	464.00 MB	480.00 MB	488.00 MB	536.00 MB	544.00 MB	696.00 MB
gloo_speed(GB/s)	0.92	1.60	1.98	2.28	2.44	2.78	2.72	2.99	2.64	2.91	2.90	2.57	2.62	2.68	3.41	2.62	2.36	2.25	2.19	2.13	2.18	2.50	2.92	2.90	3.40	2.56	2.38	2.97	2.51	2.68	2.84	2.63	2.73	2.64	2.81

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	208.00 MB	232.00 MB	240.00 MB	256.00 MB	280.00 MB	312.00 MB	320.00 MB	416.00 MB	424.00 MB	448.00 MB	464.00 MB	488.00 MB	528.00 MB	536.00 MB	752.00 MB
nccl_speed(GB/s)	0.19	0.45	0.68	0.85	1.14	1.33	1.50	1.52	2.00	1.76	1.76	2.24	2.38	2.63	2.26	2.63	2.86	2.58	3.38	3.44	4.23	3.41	3.62	3.31	3.22	4.43	5.22	4.46	2.65	2.94	4.44	5.66	5.35	5.49	3.73	3.54	5.44	3.99	3.96

github-actions · 2024-09-24T13:49:27Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	32545.25	99194.50	192927.00	260004.45	292360.40	114263.90

decode	p25	p50	p75	p95	p99	mean
latency(ms)	54.14	59.66	73.90	135.57	231.50	72.70

github-actions · 2024-09-25T04:05:25Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	216.00 MB	232.00 MB	240.00 MB	248.00 MB	272.00 MB	312.00 MB	352.00 MB	376.00 MB	424.00 MB	464.00 MB	480.00 MB	536.00 MB	544.00 MB	560.00 MB
rpc_speed(GB/s)	1.03	1.52	1.75	1.90	1.99	2.11	2.03	2.13	2.16	2.21	2.28	2.35	2.32	2.36	2.44	2.40	2.42	2.37	2.42	2.39	2.46	2.39	2.64	2.45	2.32	2.61	2.58	2.27	2.63	2.48	2.76	2.72	2.91	2.78	2.93	2.96	3.01	3.08	2.99

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	224.00 MB	264.00 MB	312.00 MB	320.00 MB	416.00 MB	424.00 MB	480.00 MB	536.00 MB
gloo_speed(GB/s)	0.94	1.56	2.08	2.27	2.47	2.53	2.69	2.91	3.05	2.95	2.82	3.05	3.31	3.30	2.61	2.37	2.73	3.03	2.74	2.42	2.36	2.61	3.02	3.19	3.02	2.83	2.57	2.53	2.40	2.85	2.29	2.60	2.61

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	208.00 MB	232.00 MB	280.00 MB	312.00 MB	336.00 MB	416.00 MB	424.00 MB	432.00 MB	472.00 MB	480.00 MB	536.00 MB	560.00 MB	568.00 MB
nccl_speed(GB/s)	0.19	0.45	0.68	0.87	1.04	1.20	1.44	1.67	1.92	1.93	1.96	2.17	2.38	2.23	2.33	2.68	2.90	3.11	2.72	3.08	2.83	4.17	2.83	3.27	5.00	4.47	2.73	3.18	3.45	5.82	6.39	3.52	5.66	7.48	6.22	5.66	4.20

github-actions · 2024-09-25T04:25:56Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	28162.75	101213.50	206765.75	237777.95	250164.08	111382.05

decode	p25	p50	p75	p95	p99	mean
latency(ms)	53.26	59.09	72.60	130.21	401.89	75.28

KuilongCui · 2024-09-24T05:13:25Z

llumnix/arg_utils.py

@@ -61,6 +62,8 @@ class EngineManagerArgs:
    last_stage_max_blocks: int = None
    max_stages: int = None

+    enable_pd_disagg: bool = False
+


in this file, set the default value to None, set default value in config/default.py.

we want to get default value from only one palce

KuilongCui · 2024-09-25T14:03:28Z

llumnix/global_scheduler/migration_scheduler.py

@@ -47,9 +56,15 @@ def __init__(self,
        self.instance_info: Dict[str, InstanceInfo] = None
        self.sorted_instance_infos: List[InstanceInfo] = None

-    def pair_migration(self) -> List[Tuple[str, str]]:
+    def pair_migration(self, pair_migration_type:str) -> List[Tuple[str, str]]:


pair_migration_type:str -> pair_migration_type: str
add a space

and other place

zhypku reviewed Aug 23, 2024

View reviewed changes

Xinyi-ECNU requested a review from zhypku August 26, 2024 06:02

zhypku reviewed Aug 26, 2024

View reviewed changes

zhypku requested review from s5u13b, KuilongCui and ZeldaHuang August 26, 2024 07:05

zhypku reviewed Aug 26, 2024

View reviewed changes

KuilongCui reviewed Aug 26, 2024

View reviewed changes

KuilongCui reviewed Aug 27, 2024

View reviewed changes

Xinyi-ECNU changed the title ~~[Core] Support for Prefill-Decode Disaggregation feature~~ [Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature Aug 27, 2024

Xinyi-ECNU force-pushed the pd_disagg branch from 598f08b to c28584b Compare August 28, 2024 03:28

Xinyi-ECNU force-pushed the pd_disagg branch from c28584b to 16a05d1 Compare September 5, 2024 03:26

s5u13b reviewed Sep 6, 2024

View reviewed changes

ZeldaHuang reviewed Sep 6, 2024

View reviewed changes

llumnix/llumlet/local_migration_scheduler.py Show resolved Hide resolved

Xinyi-ECNU force-pushed the pd_disagg branch from ab199a4 to f50fa8d Compare September 9, 2024 08:57

s5u13b reviewed Sep 9, 2024

View reviewed changes

llumnix/global_scheduler/migration_scheduler.py Show resolved Hide resolved

zhypku reviewed Sep 10, 2024

View reviewed changes

llumnix/global_scheduler/migration_scheduler.py Show resolved Hide resolved

Xinyi-ECNU force-pushed the pd_disagg branch from 937afce to c921ef9 Compare September 11, 2024 03:06

KuilongCui reviewed Sep 11, 2024

View reviewed changes

Xinyi-ECNU and others added 9 commits September 23, 2024 19:05

refactor

1170b82

fix

6082ffc

fix

d0b87d6

fix

47bf01e

fix

780affc

resolve conflict

81a8967

fix

ee41f38

fix

6970461

fix

20d10ea

Xinyi-ECNU added 2 commits September 23, 2024 19:05

fix

add0c7b

rebase

749a93f

Xinyi-ECNU force-pushed the pd_disagg branch from 3c6166e to 749a93f Compare September 24, 2024 01:59

fix ci

751e6c8

fix pylint

c52edbe

KuilongCui reviewed Sep 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Xinyi-ECNU commented Aug 23, 2024 •

edited

Loading

zhypku left a comment

CLAassistant commented Aug 28, 2024 •

edited

Loading

KuilongCui Sep 11, 2024

Xinyi-ECNU Sep 11, 2024

KuilongCui Sep 11, 2024

Xinyi-ECNU Sep 11, 2024

github-actions bot commented Sep 24, 2024

github-actions bot commented Sep 24, 2024

github-actions bot commented Sep 25, 2024

github-actions bot commented Sep 25, 2024

KuilongCui Sep 24, 2024

KuilongCui Sep 25, 2024

KuilongCui Sep 25, 2024

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Are you sure you want to change the base?

[Core] Support for Scheduling-defined Prefill-Decode Disaggregation feature #15

Conversation

Xinyi-ECNU commented Aug 23, 2024 • edited Loading

zhypku left a comment

Choose a reason for hiding this comment

CLAassistant commented Aug 28, 2024 • edited Loading

KuilongCui Sep 11, 2024

Choose a reason for hiding this comment

Xinyi-ECNU Sep 11, 2024

Choose a reason for hiding this comment

KuilongCui Sep 11, 2024

Choose a reason for hiding this comment

Xinyi-ECNU Sep 11, 2024

Choose a reason for hiding this comment

github-actions bot commented Sep 24, 2024

github-actions bot commented Sep 24, 2024

github-actions bot commented Sep 25, 2024

github-actions bot commented Sep 25, 2024

KuilongCui Sep 24, 2024

Choose a reason for hiding this comment

KuilongCui Sep 25, 2024

Choose a reason for hiding this comment

KuilongCui Sep 25, 2024

Choose a reason for hiding this comment

Xinyi-ECNU commented Aug 23, 2024 •

edited

Loading

CLAassistant commented Aug 28, 2024 •

edited

Loading