Inference server down when trying to load a model for a Code Generation Application #1557

odockal · 2024-08-16T09:25:56Z

Bug description

Inference server went down while trying to load a model.

Even while inference server is not running, in the running app for ai-lab it seems like everything is ok. App is green, should be red/orange at least to show that app actually wont work. Loading up the app on the url:port does not load.

Operating system

Mac OS 14 M2

Installation Method

from ghcr.io/containers/podman-desktop-extension-ai-lab container image

Version

next (development version)

Steps to reproduce

Running libkrun machine (6CPU/10GB) with enabled GPU support on Mac M2.
Have one service running using granite-7b-lab-Q4
Create an app using recipe -> Code Generator
Assert: Both can be run at once
Actual result: Code generator app's inference server went down

Relevant log output

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

odockal · 2024-08-16T09:29:48Z

I though that the problem is that I run two services in the machine. But even after stopping the custom service, I cannot get the inference server up of the code generator app. I think this is the problem:

...
llm_load_tensors: ggml ctx size =    0.44 MiB
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 578, got 470
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 88, in <module>
    main()
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/__main__.py", line 74, in main
    app = create_app(
          ^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 138, in create_app
    set_llama_proxy(model_settings=model_settings)
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 75, in set_llama_proxy
    _llama_proxy = LlamaProxy(models=model_settings)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 31, in __init__
    self._current_model = self.load_llama_from_model_settings(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/model.py", line 138, in load_llama_from_model_settings
    _model = create_fn(
             ^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama.py", line 314, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /models/granite-8b-code-instruct.Q4_K_M.gguf

Fixes containers#1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

…1558) Fixes containers#1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

Fixes #1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

odockal · 2024-08-16T11:54:16Z

Verified with 1.2.3 version on Mac OS 14 M2 with libkrun machine with enabled GPU...

odockal · 2024-08-16T13:05:07Z

With just CPU, it also worked.

odockal added kind/bug Something isn't working area/recipe area/inference labels Aug 16, 2024

jeffmaury added a commit to jeffmaury/ai-lab that referenced this issue Aug 16, 2024

fix: update vulkan image to work with Granite code model

368a7f7

Fixes containers#1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

jeffmaury added a commit to jeffmaury/ai-lab that referenced this issue Aug 16, 2024

fix: update vulkan image to work with Granite code model

7fa876b

Fixes containers#1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

jeffmaury mentioned this issue Aug 16, 2024

fix: update vulkan/cuda images to work with Granite code model #1558

Merged

jeffmaury self-assigned this Aug 16, 2024

jeffmaury added the sprint/unplanned label Aug 16, 2024

jeffmaury closed this as completed in #1558 Aug 16, 2024

jeffmaury closed this as completed in e9dd55e Aug 16, 2024

jeffmaury added a commit to jeffmaury/ai-lab that referenced this issue Aug 16, 2024

fix: update vulkan image to work with Granite code model (containers#…

d1ae354

…1558) Fixes containers#1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

jeffmaury mentioned this issue Aug 16, 2024

fix: update vulkan image to work with Granite code model (#1558) #1559

Merged

jeffmaury added a commit that referenced this issue Aug 16, 2024

fix: update vulkan image to work with Granite code model (#1558) (#1559)

4e68890

Fixes #1557 Signed-off-by: Jeff MAURY <jmaury@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference server down when trying to load a model for a Code Generation Application #1557

Inference server down when trying to load a model for a Code Generation Application #1557

odockal commented Aug 16, 2024

odockal commented Aug 16, 2024

odockal commented Aug 16, 2024

odockal commented Aug 16, 2024

Inference server down when trying to load a model for a Code Generation Application #1557

Inference server down when trying to load a model for a Code Generation Application #1557

Comments

odockal commented Aug 16, 2024

Bug description

Operating system

Installation Method

Version

Steps to reproduce

Relevant log output

Additional context

odockal commented Aug 16, 2024

odockal commented Aug 16, 2024

odockal commented Aug 16, 2024