You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image
model_id = "hf-internal-testing/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)
IMG_URLS = [
"https://picsum.photos/id/237/400/300",
"https://picsum.photos/id/231/200/300",
"https://picsum.photos/id/27/500/500",
"https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"
inputs = processor(images=IMG_URLS, text=PROMPT, return_tensors="pt").to("cuda")
generate_ids = model.generate(**inputs, max_new_tokens=500)
ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
EXPECTED_GENERATION = """
Describe the images.
Sure, let's break down each image description:
1. **Image 1:**
- **Description:** A black dog with a glossy coat is sitting on a wooden floor. The dog has a focused expression and is looking directly at the camera.
- **Details:** The wooden floor has a rustic appearance with visible wood grain patterns. The dog's eyes are a striking color, possibly brown or amber, which contrasts with its black fur.
2. **Image 2:**
- **Description:** A scenic view of a mountainous landscape with a winding road cutting through it. The road is surrounded by lush green vegetation and leads to a distant valley.
- **Details:** The mountains are rugged with steep slopes, and the sky is clear, indicating good weather. The winding road adds a sense of depth and perspective to the image.
3. **Image 3:**
- **Description:** A beach scene with waves crashing against the shore. There are several people in the water and on the beach, enjoying the waves and the sunset.
- **Details:** The waves are powerful, creating a dynamic and lively atmosphere. The sky is painted with hues of orange and pink from the setting sun, adding a warm glow to the scene.
4. **Image 4:**
- **Description:** A garden path leading to a large tree with a bench underneath it. The path is bordered by well-maintained grass and flowers.
- **Details:** The path is made of small stones or gravel, and the tree provides a shaded area with the bench invitingly placed beneath it. The surrounding area is lush and green, suggesting a well-kept garden.
Each image captures a different scene, from a close-up of a dog to expansive natural landscapes, showcasing various elements of nature and human interaction with it.
"""
Error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[4], line 5
2 from PIL import Image
4 model_id = "hf-internal-testing/pixtral-12b"
----> 5 model = LlavaForConditionalGeneration.from_pretrained(model_id,cache_dir='').to("cuda")
6 processor = AutoProcessor.from_pretrained(model_id)
8 IMG_URLS = [
9 "https://picsum.photos/id/237/400/300",
10 "https://picsum.photos/id/231/200/300",
11 "https://picsum.photos/id/27/500/500",
12 "https://picsum.photos/id/17/150/600",
13 ]
File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3984, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3974 if dtype_orig is not None:
3975 torch.set_default_dtype(dtype_orig)
3977 (
3978 model,
3979 missing_keys,
3980 unexpected_keys,
3981 mismatched_keys,
3982 offload_index,
3983 error_msgs,
-> 3984 ) = cls._load_pretrained_model(
3985 model,
3986 state_dict,
3987 loaded_state_dict_keys, # XXX: rename?
3988 resolved_archive_file,
3989 pretrained_model_name_or_path,
3990 ignore_mismatched_sizes=ignore_mismatched_sizes,
3991 sharded_metadata=sharded_metadata,
3992 _fast_init=_fast_init,
3993 low_cpu_mem_usage=low_cpu_mem_usage,
3994 device_map=device_map,
3995 offload_folder=offload_folder,
3996 offload_state_dict=offload_state_dict,
3997 dtype=torch_dtype,
3998 hf_quantizer=hf_quantizer,
3999 keep_in_fp32_modules=keep_in_fp32_modules,
4000 gguf_path=gguf_path,
4001 )
4003 # make sure token embedding weights are still tied if needed
4004 model.tie_weights()
File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:4529, in PreTrainedModel._load_pretrained_model(***failed resolving arguments***)
4525 if "size mismatch" in error_msg:
4526 error_msg += (
4527 "\n\tYou may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method."
4528 )
-> 4529 raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
4531 if len(unexpected_keys) > 0:
4532 archs = [] if model.config.architectures is None else model.config.architectures
RuntimeError: Error(s) in loading state_dict for LlavaForConditionalGeneration:
size mismatch for language_model.model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
Expected behavior
I would expect the model to load normally. Something is off in the dimensions. Is there perhaps another model version on HuggingFace Hub with the correct config? Many thanks.
P.S. I had to uninstall flash attn, I assume that's just not supported, worth adding to docs.
The text was updated successfully, but these errors were encountered:
System Info
transformers
version: 4.45.0.dev0Who can help?
@amyeroberts @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm running the exact code shown on this page:
Error:
Expected behavior
I would expect the model to load normally. Something is off in the dimensions. Is there perhaps another model version on HuggingFace Hub with the correct config? Many thanks.
P.S. I had to uninstall flash attn, I assume that's just not supported, worth adding to docs.
The text was updated successfully, but these errors were encountered: