Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OCR module #157

Merged
merged 4 commits into from
Mar 9, 2024
Merged

Update OCR module #157

merged 4 commits into from
Mar 9, 2024

Conversation

GNEHUY
Copy link

@GNEHUY GNEHUY commented Mar 7, 2024

Thanks for sending a pull request!
Please make sure you click the link above to view the contribution guidelines,
then fill out the blanks below.

Description

Update OCR module:ocr.py and test_ocr.py and segment.py

What does this implement/fix? Explain your changes.

1.update ocr.py
2.update test_ocr.py
3.modify segment.py
4.add item_ocr_formula.png
5.modify SIF/sif4sci and Tokenizer/CustomTokenizer, PureTextTokenizer, AstFormulaTokenizer for add convert_image_to_latex
6.modify AUTHORS.md

Pull request type

  • [DATASET] Add a new dataset
  • [BUGFIX] Bugfix
  • [FEATURE] New feature (non-breaking change which adds functionality)
  • [BREAKING] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [STYLE] Code style update (formatting, renaming)
  • [REFACTOR] Refactoring (no functional changes, no api changes)
  • [BUILD] Build related changes
  • [DOC] Documentation content changes
  • [OTHER] Other (please describe):

Changes

1.update ocr.py
2.update test_ocr.py
3.modify segment.py
4.add item_ocr_formula.png
5.modify SIF/sif4sci and Tokenizer/CustomTokenizer, PureTextTokenizer, AstFormulaTokenizer for add convert_image_to_latex
6.modify AUTHORS.md

Does this close any currently open issues?

no

Any relevant logs, error output, etc?

no

Checklist

Before you submit a pull request, please make sure you have to following:

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [FEATURE], [BREAKING], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage and al tests passing
  • Code is well-documented (extended the README / documentation, if necessary)
  • If this PR is your first one, add your name and github account to AUTHORS.md

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.33%. Comparing base (504d147) to head (d235439).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #157      +/-   ##
==========================================
+ Coverage   97.31%   97.33%   +0.02%     
==========================================
  Files          84       85       +1     
  Lines        4651     4694      +43     
==========================================
+ Hits         4526     4569      +43     
  Misses        125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nnnyt nnnyt requested a review from KenelmQLH March 8, 2024 13:11
Copy link
Collaborator

@KenelmQLH KenelmQLH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that SIF/sif4sci, Tokenizer/CustomTokenizer, PureTextTokenizer, AstFormulaTokenizer also use seg or sif4sci. Please add convert_image_to_latex to them.

@KenelmQLH KenelmQLH merged commit 855e250 into bigdata-ustc:dev Mar 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants