Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create biblia-arabica-example-code.yml #129

Merged
merged 3 commits into from
May 27, 2024

Conversation

nathangibson
Copy link
Contributor

No description provided.

@alix-tz
Copy link
Member

alix-tz commented Nov 10, 2023

Hello,

Thank you for this submission. Is it a work in progress or are you trying to submit it as is?

There are several problems which need to be fixed before the entry can be added to the catalog.

  • you need to provide a list of authors
  • it would be helpful to provide a more precise description of the dataset so that potential re-users can understand what is in the dataset, in particular since the images are not freely available for a portion of the dataset (if I understand well your documentation)
  • are the transcriptions really spanning from 900 to 1900?
  • I think the rules listed in your transcription convention could be pasted in the "transcription guidelines" field (but this is something I can fix).

That being said, my main issue is actually that I am not able to load the dataset in eScriptorium. I get the following errors when I do (see below), which might be caused by the fact that the value in "fileName" does not match the names of the image files. I tried on two instances of eScriptorium (v0.13.8b and v0.13.4b) with the same result. Did you try to import them in eScriptorium? Which version of eScriptorium did you use to export them? Did you generate them all in the command with Kraken? If yes, with which version?

Note that some of the errors are normal, I didn't load all the images.

Import in biblia-arabica
Status: Finished
Queued at: Nov. 10, 2023, 3:23 p.m.
Started at: Nov. 10, 2023, 3:23 p.m.
Ended at: Nov. 10, 2023, 3:23 p.m.
CPU usage: 14.984762666666667
GPU usage: None

No match found for file laud-or-258-unvocalized_013.xml with filename "laud-or-258_013.jpg".
[...]
No match found for file laud-or-258-unvocalized_042.xml with filename "laud-or-258_042.jpg".
Processing the page n°1 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_013.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_013.xml?
Processing the page n°2 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_014.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_014.xml?
Processing the page n°3 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_015.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_015.xml?
Processing the page n°4 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_016.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_016.xml?
Processing the page n°5 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_017.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_017.xml?
Processing the page n°6 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_018.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_018.xml?
Processing the page n°7 from the provided METS file
An exception occurred while processing the page: Invalid URL 'None/laud-or-258_019.xml': No scheme supplied. Perhaps you meant https://None/laud-or-258_019.xml?
[...]
No match found for file OxfordLaudOr258_723.xml with filename "laud-or-258_723.jpg".
No match found for file OxfordLaudOr258_724.xml with filename "laud-or-258_724.jpg".

@nathangibson
Copy link
Contributor Author

Thanks so much, and apologies that it took me a while before I saw your reply!

Thank you for this submission. Is it a work in progress or are you trying to submit it as is?

We would like to do more work on it but think it is already useful.

There are several problems which need to be fixed before the entry can be added to the catalog.

* you need to provide a list of authors

Done (see the above merges).

* it would be helpful to provide a more precise description of the dataset so that potential re-users can understand what is in the dataset, in particular since the images are not freely available for a portion of the dataset (if I understand well your documentation)

Will work on this -- basically explaining the image rights?

* are the transcriptions really spanning from 900 to 1900?

Yes, although there are only a few pages of the later material.

* I think the rules listed in your transcription convention could be pasted in the "transcription guidelines" field (but this is something I can fix).

Done.

That being said, my main issue is actually that I am not able to load the dataset in eScriptorium. I get the following errors when I do (see below), which might be caused by the fact that the value in "fileName" does not match the names of the image files. I tried on two instances of eScriptorium (v0.13.8b and v0.13.4b) with the same result. Did you try to import them in eScriptorium? Which version of eScriptorium did you use to export them? Did you generate them all in the command with Kraken? If yes, with which version?

Sorry, I think the issue was changing filenames after download, without realizing this would mess up the METS import. I've corrected this now. (e.g. https://github.com/biblia-arabica/academies/tree/main/htr/ground-truth)

Another main issue was that it wasn't so clear where the ground truth was. I've restructured to make this clearer. If you think https://github.com/biblia-arabica/academies/tree/main/htr/ground-truth is in order I will do the same for the other manuscripts. Thanks for your input!

@alix-tz alix-tz self-assigned this Mar 13, 2024
@PonteIneptique
Copy link
Member

@alix-tz Can you have a look ?

Updating the description to put the PR up to date with the discussion and the modification done in https://github.com/biblia-arabica/academies/blob/main/htr-united.yml
Adding missing information about volume and fixing license declaration
@alix-tz
Copy link
Member

alix-tz commented May 27, 2024

Ok, we are good now I believe!

I'm sorry @nathangibson that this took so long, I hadn't realized that you had updated the metadata because you didn't report the changes in the file attached to this PR. But that shouldn't have blocked us from moving on with adding the description in the catalog.

Thank you very much for your contribution!

@alix-tz alix-tz merged commit ec9fe4d into HTR-United:master May 27, 2024
1 check passed
@nathangibson
Copy link
Contributor Author

@alix-tz Thanks for this! And apologies, I'm not familiar with the process so I didn't realize about the file attached to the PR. I appreciate your including us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants