Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More General OCR #50

Open
KannamSridharKumar opened this issue Sep 17, 2024 · 7 comments
Open

More General OCR #50

KannamSridharKumar opened this issue Sep 17, 2024 · 7 comments

Comments

@KannamSridharKumar
Copy link

On the example image, More General OCR at the bottom,
music notes, chemical compound, some geometrical shapes are shown.

Whats the python command to extract such things?
I've tried all the example code provided, but they are extracting as plain text only.

Thanks,

@Ucas-HaoranWei
Copy link
Owner

--type format is OK.

@KannamSridharKumar
Copy link
Author

KannamSridharKumar commented Sep 18, 2024

I tried that, it didn't work. It didn't extract chemical formula.
Can you pls share the full command. Thanks,

@Ucas-HaoranWei
Copy link
Owner

The full command is the same as 'format' OCR.
Can you show me your input image?

@KannamSridharKumar
Copy link
Author

KannamSridharKumar commented Sep 18, 2024

chem

I couldn't install it on colab from source code, so I'm using via HuggingFace Pipeline.

res = model.chat(tokenizer, image_file, ocr_type='format')

https://huggingface.co/stepfun-ai/GOT-OCR2_0

I've also tried the HF demo - https://huggingface.co/spaces/ucaslcl/GOT_online

@Ucas-HaoranWei
Copy link
Owner

I know, it is because the difference between this image and the images we rendered is too large.
There are two solutions:

  1. Use the chemical compound like this image:
    2024-09-1813 52 55
  2. Fine-tune the model with your data.
    Thank you~

@KannamSridharKumar
Copy link
Author

Thank you very much, I understand, can you please share the 3 example images used in the More General OCR section. I tried with musical notes and geometry images but it didn't work but probably my images are too different from whats the model has been trained on.

@Ucas-HaoranWei
Copy link
Owner

Hi, the benchmark.zip includes samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants