-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'HTR-United:master' into master
- Loading branch information
Showing
91 changed files
with
36,617 additions
and
4,302 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"https://doi.org/10.5281/zenodo.5153263": "repo-00000", "https://zenodo.org/record/4780947#.YhN5pVvMLUQ": "repo-00001", "https://github.com/calfa-co/rasam-dataset": "repo-00002", "https://github.com/DesenrollandoElCordel/FoNDUE-Spanish-chapbooks-Dataset": "repo-00003", "https://zenodo.org/record/3333627#.YhN1G1vMLUQ": "repo-00004", "https://github.com/rescribe/carolineminuscule-groundtruth": "repo-00005", "http://dx.doi.org/10.34847/nkl.acb724xs": "repo-00006", "https://github.com/e-ditiones/OCR17plus": "repo-00007", "https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame": "repo-00008", "https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets": "repo-00009", "https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR": "repo-00010", "https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893": "repo-00011", "https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny": "repo-00012", "https://github.com/PSL-Chartes-HTR-Students/HN2021-Boccace": "repo-00013", "https://github.com/PSL-Chartes-HTR-Students/HN2021-Memorials_Jane_Lathrop_Stanford": "repo-00014", "https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles": "repo-00015", "https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz": "repo-00016", "https://github.com/jpmjpmjpm/genauto-td-htr.git": "repo-00017", "https://doi.org/10.5281/zenodo.5179361": "repo-00018", "HTR-United/tapuscorpus": "repo-00019", "HTR-United/timeuscorpus": "repo-00020", "HTR-United/dahncorpus": "repo-00021", "HTR-United/cremma-medieval": "repo-00022", "HTR-United/cremma-16-17-print": "repo-00023", "HTR-United/CREMMA-Medieval-LAT": "repo-00024", "HTR-United/CREMMA-MSS-17": "repo-00025", "HTR-United/CREMMA-MSS-18": "repo-00026", "HTR-United/CREMMA-MSS-19": "repo-00027", "HTR-United/CREMMA-MSS-20": "repo-00028", "HTR-United/lectaurep-bronod": "repo-00029", "HTR-United/lectaurep-mariages-et-divorces": "repo-00030", "HTR-United/lectaurep-repertoires": "repo-00031", "HTR-United/CREMMA-AN-TestamentDePoilus": "repo-00032", "HTR-United/cremma-wikipedia": "repo-00033", "Gallicorpora/HTR-MSS-15e-Siecle": "repo-00034", "Gallicorpora/HTR-incunable-15e-siecle": "repo-00035", "Gallicorpora/HTR-imprime-16e-siecle": "repo-00036", "Gallicorpora/HTR-imprime-17e-siecle": "repo-00037", "Gallicorpora/HTR-imprime-gothique-16e-siecle": "repo-00038", "Gallicorpora/HTR-imprime-18e-siecle": "repo-00039", "FoNDUE-HTR/FONDUE-FR-PRINT-17": "repo-00040", "FoNDUE-HTR/FONDUE-FR-PRINT-16": "repo-00041"} |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
schema: https://htr-united.github.io/schema/2022-04-15/schema.json | ||
title: 'GT4HistCommentLayout: Layout Ground Truth for Historical Commentaries' | ||
url: https://github.com/AjaxMultiCommentary/GT-commentaries-OLR | ||
authors: | ||
- name: Matteo | ||
surname: Romanello | ||
orcid: 0000-0002-7406-6286 | ||
roles: | ||
- project-manager | ||
- name: Sven | ||
surname: Najem-Meyer | ||
orcid: 0000-0002-3661-4579 | ||
roles: | ||
- transcriber | ||
- quality-control | ||
- name: Carla | ||
surname: Amaya | ||
roles: | ||
- transcriber | ||
description: 'This dataset contains layout annotations for ca. 370 pages sampled from | ||
8 public domain classical commentaries, published in the 19th century in English, | ||
German and Latin. The commentaries concern Ancient Greek and Latin works from prose | ||
and poetry (caveat: AGreek poetry is slightly over-represented). Pages were annotated | ||
according to a taxonomy mapped to the SegmOnto controlled vocabulary.' | ||
project-name: Ajax Multi-Commentary | ||
project-website: https://mromanello.github.io/ajax-multi-commentary/ | ||
language: | ||
- eng | ||
- deu | ||
- lat | ||
- grc | ||
production-software: Kraken + VGG Image Annotator (VIA) | ||
script: | ||
- iso: Latn | ||
- iso: Grek | ||
script-type: only-typed | ||
time: | ||
notBefore: '1835' | ||
notAfter: '1903' | ||
hands: | ||
count: '1' | ||
precision: exact | ||
license: | ||
- name: CC-BY 4.0 | ||
url: https://creativecommons.org/licenses/by/4.0/ | ||
format: Alto-XML | ||
volume: | ||
- metric: characters | ||
count: 0 | ||
- metric: files | ||
count: 371 | ||
- metric: lines | ||
count: 0 | ||
- metric: regions | ||
count: 2386 | ||
transcription-guidelines: SegmOnto guidelines (v. 0.9) | ||
citation-file-link: https://github.com/AjaxMultiCommentary/GT-commentaries-layout/blob/master/CITATION.cff | ||
characters: | ||
mode: NFD | ||
members: [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
schema: https://htr-united.github.io/schema/2022-04-15/schema.json | ||
title: Moonshines | ||
url: https://github.com/alix-tz/moonshines | ||
authors: | ||
- name: Alix | ||
surname: "Chagu\xE9" | ||
orcid: 0000-0002-0136-4434 | ||
roles: | ||
- transcriber | ||
- aligner | ||
- project-manager | ||
- digitization | ||
institutions: [] | ||
description: This dataset is composed of pages of text written in 2023 by a single | ||
person, copying texts taken from Guillaume Apollinaire's poems published in Alcools, | ||
and taken from Guillaume Apollinaire's Wikipedia page. | ||
language: | ||
- fra | ||
production-software: eScriptorium + Kraken | ||
script: | ||
- iso: Latn | ||
script-type: only-manuscript | ||
time: | ||
notBefore: '2023' | ||
notAfter: '2023' | ||
hands: | ||
count: '1' | ||
precision: exact | ||
license: | ||
- name: CC-BY 4.0 | ||
url: https://creativecommons.org/licenses/by/4.0/ | ||
format: Alto-XML | ||
volume: | ||
- metric: characters | ||
count: 27734 | ||
- metric: files | ||
count: 45 | ||
- metric: lines | ||
count: 1016 | ||
- metric: regions | ||
count: 45 | ||
citation-file-link: https://github.com/alix-tz/moonshines/blob/master/CITATION.cff | ||
transcription-guidelines: The transcription strictly follows what is written on the | ||
images, including accentuation or capitalization errors. The segmentation follows | ||
the SegmOnto ontology and mostly relies on MainZone and DefaultLine. Beware that | ||
this dataset barely contains any ponctuation and that most lines begin with a capital | ||
letter. | ||
characters: | ||
mode: NFD | ||
members: | ||
- e | ||
- s | ||
- a | ||
- n | ||
- r | ||
- i | ||
- t | ||
- u | ||
- o | ||
- l | ||
- d | ||
- m | ||
- c | ||
- p | ||
- "\u0301" | ||
- '''' | ||
- v | ||
- g | ||
- b | ||
- h | ||
- "\u0300" | ||
- f | ||
- L | ||
- q | ||
- E | ||
- '1' | ||
- A | ||
- C | ||
- x | ||
- y | ||
- "\u0302" | ||
- S | ||
- '9' | ||
- P | ||
- M | ||
- j | ||
- T | ||
- D | ||
- '-' | ||
- N | ||
- J | ||
- R | ||
- '0' | ||
- z | ||
- O | ||
- I | ||
- '2' | ||
- '8' | ||
- V | ||
- F | ||
- G | ||
- U | ||
- '5' | ||
- B | ||
- Q | ||
- ) | ||
- H | ||
- '3' | ||
- ( | ||
- '7' | ||
- '6' | ||
- w | ||
- k | ||
- '4' | ||
- "\u0327" | ||
- K | ||
- Z | ||
- "\u0308" | ||
- Y | ||
- '{' | ||
- '}' | ||
- W | ||
- . | ||
- X | ||
- ',' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
schema: https://htr-united.github.io/schema/2022-04-15/schema.json | ||
title: Peraire Ground Truth | ||
url: https://github.com/alix-tz/peraire-ground-truth | ||
authors: | ||
- name: Alix | ||
surname: Chagué | ||
orcid: 0000-0002-0136-4434 | ||
roles: | ||
- transcriber | ||
- quality-control | ||
institutions: | ||
- name: Bibliothèque Sébert, Espéranto-France, Paris | ||
roles: | ||
- digitization | ||
description: >- | ||
This dataset was created in order to produce an HTR model for the Digital | ||
Peraire project. The documents are handwritten, dating from the second half of | ||
the 20th century, written by Lucien Péraire in French with a blue ink pen or, | ||
more frequently, with a blue pencil. | ||
project-name: Digital Peraire | ||
language: | ||
- fra | ||
production-software: eScriptorium + Kraken | ||
script: | ||
- iso: Latn | ||
script-type: only-manuscript | ||
time: | ||
notBefore: '1928' | ||
notAfter: '1971' | ||
hands: | ||
count: '1' | ||
precision: exact | ||
license: | ||
- name: CC-BY 4.0 | ||
url: https://creativecommons.org/licenses/by/4.0/ | ||
format: Alto-XML | ||
volume: | ||
- metric: characters | ||
count: 38793 | ||
- metric: files | ||
count: 33 | ||
- metric: lines | ||
count: 1059 | ||
- metric: regions | ||
count: 80 | ||
citation-file-link: https://github.com/alix-tz/peraire-ground-truth/blob/master/CITATION.cff | ||
transcription-guidelines: >- | ||
The transcription respects what is written on the document, including | ||
ponctuation and spelling errors. The case is respected: capital letters are | ||
transcribed with capital letters. Crossed out words are signaled by # which | ||
isn't used to transcribe anything else. The SegmOnto ontology was used for the | ||
segmentation of this dataset. For regions, MainZone and MarginTextZone were | ||
used. For lines, DefaultLine and InterlinearLine were used. The original | ||
documents are held at the Bibliothèque Sébert, Espéranto-France, Paris. They | ||
should be mentionned every time the images are used. |
Oops, something went wrong.