Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict fileGrp names more #746

Closed
bertsky opened this issue Nov 24, 2021 · 2 comments · Fixed by #758
Closed

restrict fileGrp names more #746

bertsky opened this issue Nov 24, 2021 · 2 comments · Fixed by #758

Comments

@bertsky
Copy link
Collaborator

bertsky commented Nov 24, 2021

Although the spec requires fileGrp/@USE names to follow a very strict scheme, we have not enforced this in core (only the workspace validator checks it). However, if fileGrp names are left completely unrestricted, we get follow-up problems: For example, since we normally base file IDs on fileGrp names, some user choices will unwittingly end up in invalid METS:

element file: Schemas validity error : Element '{http://www.loc.gov/METS/}file', attribute 'ID': 'OCR-D-OCR-TESS-Fraktur+Latin-SEG-LINE-tesseract-ocropy-DEWARP_0005' is not a valid value of the atomic type 'xs:ID'.
element file: Schemas validity error : Element '{http://www.loc.gov/METS/}file', attribute 'ID': 'OCR-D-GT-SEG-PAGE-ſs-sſ-EVAL_0006' is not a valid value of the atomic type 'xs:ID'.
...
element fptr: Schemas validity error : Element '{http://www.loc.gov/METS/}fptr', attribute 'FILEID': 'OCR-D-OCR-TESS-Fraktur+Latin-SEG-LINE-tesseract-ocropy-DEWARP_0005' is not a valid value of the atomic type 'xs:IDREF'.
element fptr: Schemas validity error : Element '{http://www.loc.gov/METS/}fptr', attribute 'FILEID': 'OCR-D-GT-SEG-PAGE-ſs-sſ-EVAL_0006' is not a valid value of the atomic type 'xs:IDREF'.

I therefore suggest extending add_file's

if not REGEX_FILE_ID.fullmatch(ID):
raise ValueError("Invalid syntax for mets:file/@ID %s" % ID)
check to add_file_grp.

@kba
Copy link
Member

kba commented Nov 24, 2021

Agreed.

We should probably also make this explicit in the spec, since we do not require the naming schema (SHOULD not MUST) but we should add that "mets:fileGrp/@use MUST be a valid xs:ID.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 24, 2021

Indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants