Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in newest dev ver: Worker.__init__() got an unexpected keyword argument 'cache_config' #2640

Closed
yippp opened this issue Jan 29, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@yippp
Copy link

yippp commented Jan 29, 2024

I build the newest master branch with #2279 commit.
And I run the following command
python -m vllm.entrypoints.openai.api_server --model ./Mistral-7B-Instruct-v0.2-AWQ --quantization awq --dtype auto --host 0.0.0.0 --port 8081 --tensor-parallel-size 2
I meet the error:


INFO 01-29 09:41:47 api_server.py:209] args: Namespace(host='0.0.0.0', port=8081, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='./Mistral-7B-Instruct-v0.2-AWQ', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization='awq', enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 01-29 09:41:47 config.py:177] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
2024-01-29 09:41:49,090 INFO worker.py:1724 -- Started a local Ray instance.
INFO 01-29 09:41:50 llm_engine.py:72] Initializing an LLM engine with config: model='./Mistral-7B-Instruct-v0.2-AWQ', tokenizer='./Mistral-7B-Instruct-v0.2-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, seed=0)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/my/vllm/vllm/entrypoints/openai/api_server.py", line 217, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/async_llm_engine.py", line 615, in from_engine_args
    engine = cls(parallel_config.worker_use_ray,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/async_llm_engine.py", line 319, in __init__
    self.engine = self._init_engine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/async_llm_engine.py", line 364, in _init_engine
    return engine_class(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/llm_engine.py", line 109, in __init__
    self._init_workers_ray(placement_group)
  File "/home/my/vllm/vllm/engine/llm_engine.py", line 260, in _init_workers_ray
    self.driver_worker = Worker(
                         ^^^^^^^
TypeError: Worker.__init__() got an unexpected keyword argument 'cache_config'
2024-01-29 09:41:54,676 ERROR worker.py:405 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RayWorkerVllm.init_worker() (pid=3160378, ip=10.20.4.57, actor_id=ca7bf2aa56e3f1a0c1a7678201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f043133e7d0>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/ray_utils.py", line 23, in init_worker
    self.worker = worker_init_fn()
                  ^^^^^^^^^^^^^^^^
  File "/home/my/vllm/vllm/engine/llm_engine.py", line 247, in <lambda>
    lambda rank=rank, local_rank=local_rank: Worker(
                                             ^^^^^^^
TypeError: Worker.__init__() got an unexpected keyword argument 'cache_config'

I am running with python=3.11, CUDA 12.1, driver 530 with 2x RTX 3090 NVLink.
I notice there is a discussion (#2279 (comment)) about cache_config, I am not sure whether it is related

@yippp
Copy link
Author

yippp commented Jan 29, 2024

I tried to rollback to previous commit, this error disapper but I meet another error
Failed: Cuda error /home/ysq/vllm/csrc/custom_all_reduce.cuh:417 'resource already mapped',
Since this unexpected keyword argument 'cache_config' is caused by #2279

@WoosukKwon WoosukKwon added the bug Something isn't working label Jan 29, 2024
@hanzhi713
Copy link
Contributor

I can also reproduce this bug

@zhaoyang-star
Copy link
Contributor

Sorry for causing this error. Please apply #2644 and we will merge it soon. @yippp @hanzhi713

@zhaoyang-star
Copy link
Contributor

@yippp The #2644 has been merged. Please use the latest code and close the issue if the error disappears.

@WoosukKwon
Copy link
Collaborator

Closed by #2644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants