-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create corpora for benchmarking #130
Comments
This is a first naive overview of my GT categorization: gt_overview.ods. If this isn't of that much use, I'll have a deeper look into that. |
Here ist second, reviewed version of the sheet: EDIT: Replaced all instances of |
First draft for the corporaGeneral thoughts
Categories16th century, fraktur, simple layout
16th century, fraktur, complex layout
16th century, antiqua, simple layout
16th century, antiqua, complex layout
16th century, font mix, simple layout
16th century, font mix, complex layout
17th century, fraktur, simple layout
17th century, fraktur, complex layout
17th century, antiqua, simple layout
17th century, antiqua, complex layout
17th century, font mix, simple layoutfraktur, antiqua
fraktur, antiqua, ancient Greek, Hebrew
17th century, font mix, complex layoutfraktur, antiqua
18th century, fraktur, simple layout
18th century, fraktur, complex layout
18th century, antiqua, simple layout
18th century, antiqua, complex layout
18th century, font mix, simple layout
18th century, font mix, complex layout
19th century, antiqua [1]
19th century, fraktur [1]
[1] We only have two works with text GT for the 19th century, blumenbach_anatomie_1805.ocrd and arnimb_goethe03_1835.ocrd. Since the 19th century isn't part of our scope, we'll limit ourselves to the material we already have. |
Creating the simple casesCategories16th century, fraktur, simple layout
16th century, antiqua, simple layout
16th century, antiqua, complex layout
17th century, fraktur, simple layout
17th century, font mix, simple layoutfraktur, antiqua
fraktur, antiqua, ancient Greek, Hebrew
18th century, fraktur, simple layout
18th century, antiqua, simple layout
18th century, font mix, complex layout
19th century, antiqua
19th century, fraktur
|
The data is now available at https://github.com/OCR-D/quiver-data.git. |
In order to execute the benchmarking we need some data with different characteristics to work on.
@mweidling already has examined the OCR-D GT repository and wants to discuss with @tboenig and @cneud about useful corpora.
TODOs:
The text was updated successfully, but these errors were encountered: