restrict fileGrp names more

Although the spec requires `fileGrp/@USE` names to follow a very strict scheme, we have not enforced this in core (only the workspace validator checks it). However, if fileGrp names are left completely unrestricted, we get follow-up problems: For example, since we normally base file IDs on fileGrp names, some user choices will unwittingly end up in invalid METS:

```
element file: Schemas validity error : Element '{http://www.loc.gov/METS/}file', attribute 'ID': 'OCR-D-OCR-TESS-Fraktur+Latin-SEG-LINE-tesseract-ocropy-DEWARP_0005' is not a valid value of the atomic type 'xs:ID'.
element file: Schemas validity error : Element '{http://www.loc.gov/METS/}file', attribute 'ID': 'OCR-D-GT-SEG-PAGE-ſs-sſ-EVAL_0006' is not a valid value of the atomic type 'xs:ID'.
...
element fptr: Schemas validity error : Element '{http://www.loc.gov/METS/}fptr', attribute 'FILEID': 'OCR-D-OCR-TESS-Fraktur+Latin-SEG-LINE-tesseract-ocropy-DEWARP_0005' is not a valid value of the atomic type 'xs:IDREF'.
element fptr: Schemas validity error : Element '{http://www.loc.gov/METS/}fptr', attribute 'FILEID': 'OCR-D-GT-SEG-PAGE-ſs-sſ-EVAL_0006' is not a valid value of the atomic type 'xs:IDREF'.
```

I therefore suggest extending `add_file`'s https://github.com/OCR-D/core/blob/d9f660ee727c5235813e7f1534e26f2bebe483d3/ocrd_models/ocrd_models/ocrd_mets.py#L298-L299 check to `add_file_grp`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restrict fileGrp names more #746

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if not REGEX_FILE_ID.fullmatch(ID):
	raise ValueError("Invalid syntax for mets:file/@ID %s" % ID)

restrict fileGrp names more #746

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions