Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference server down when trying to load a model for a Code Generation Application #1557

Closed
odockal opened this issue Aug 16, 2024 · 3 comments · Fixed by #1558
Closed

Inference server down when trying to load a model for a Code Generation Application #1557

odockal opened this issue Aug 16, 2024 · 3 comments · Fixed by #1558

Comments

@odockal
Copy link
Contributor

odockal commented Aug 16, 2024

Bug description

Inference server went down while trying to load a model.

Screenshot_20240816_112429

Even while inference server is not running, in the running app for ai-lab it seems like everything is ok. App is green, should be red/orange at least to show that app actually wont work. Loading up the app on the url:port does not load.

Operating system

Mac OS 14 M2

Installation Method

from ghcr.io/containers/podman-desktop-extension-ai-lab container image

Version

next (development version)

Steps to reproduce

  1. Running libkrun machine (6CPU/10GB) with enabled GPU support on Mac M2.
  2. Have one service running using granite-7b-lab-Q4
  3. Create an app using recipe -> Code Generator
  4. Assert: Both can be run at once
    Actual result: Code generator app's inference server went down

Relevant log output

No response

Additional context

No response

@odockal
Copy link
Contributor Author

odockal commented Aug 16, 2024

I though that the problem is that I run two services in the machine. But even after stopping the custom service, I cannot get the inference server up of the code generator app. I think this is the problem:

...
llm_load_tensors: ggml ctx size =    0.44 MiB
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 578, got 470
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 88, in <module>
    main()
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 74, in main
    app = create_app(
          ^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 138, in create_app
    set_llama_proxy(model_settings=model_settings)
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 75, in set_llama_proxy
    _llama_proxy = LlamaProxy(models=model_settings)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 31, in __init__
    self._current_model = self.load_llama_from_model_settings(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 138, in load_llama_from_model_settings
    _model = create_fn(
             ^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama.py", line 314, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /models/granite-8b-code-instruct.Q4_K_M.gguf

jeffmaury added a commit to jeffmaury/ai-lab that referenced this issue Aug 16, 2024
Fixes containers#1557

Signed-off-by: Jeff MAURY <jmaury@redhat.com>
jeffmaury added a commit to jeffmaury/ai-lab that referenced this issue Aug 16, 2024
Fixes containers#1557

Signed-off-by: Jeff MAURY <jmaury@redhat.com>
@jeffmaury jeffmaury self-assigned this Aug 16, 2024
jeffmaury added a commit to jeffmaury/ai-lab that referenced this issue Aug 16, 2024
jeffmaury added a commit that referenced this issue Aug 16, 2024
Fixes #1557

Signed-off-by: Jeff MAURY <jmaury@redhat.com>
@odockal
Copy link
Contributor Author

odockal commented Aug 16, 2024

Verified with 1.2.3 version on Mac OS 14 M2 with libkrun machine with enabled GPU...

@odockal
Copy link
Contributor Author

odockal commented Aug 16, 2024

With just CPU, it also worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants