Problem
The validation pipeline references a communitymech.validators.term_validator module that does not exist (src/communitymech/validators/term_validator.py is missing), while __init__.py tries to import it. The just validate-terms target fails immediately.
Meanwhile, the plan docs and README already mention linkml-term-validator as the intended tool — we should use it directly rather than writing a custom Python wrapper.
What needs to happen
1. Add linkml-term-validator as a dependency
# pyproject.toml
dependencies = [
...
"linkml-term-validator>=0.1.0",
]
2. Add schema-level bindings or dynamic enums
linkml-term-validator validate-data validates against dynamic enums and bindings — LinkML features that constrain which ontology terms are valid in a given slot. The current schema uses a generic Term class with plain string id/label fields, so the validator has nothing to check against.
Options (not mutually exclusive):
Option A: Bindings on descriptor slots — bind each descriptor's term.id to a prefix-constrained enum:
classes:
TaxonDescriptor:
attributes:
term:
range: Term
bindings:
- binds_value_of: id
range: NCBITaxonEnum
Option B: Dynamic enums with reachable_from — if we want to constrain to specific branches:
enums:
NCBITaxonEnum:
reachable_from:
source_ontology: obo:ncbitaxon
source_nodes:
- NCBITaxon:2 # Bacteria
relationship_types:
- rdfs:subClassOf
Option C: At minimum, id_prefixes — add id_prefixes to Term subclasses to at least validate the CURIE prefix is correct (e.g. NCBITaxon for taxa, CHEBI for metabolites). This is simpler but only checks prefix, not term existence/labels.
3. Update justfile targets
Replace the broken custom module calls with direct linkml-term-validator CLI:
# Validate ontology terms in data files
validate-terms FILE:
uv run linkml-term-validator validate-data {{FILE}} -s src/communitymech/schema/communitymech.yaml --labels
validate-terms-all:
#!/usr/bin/env bash
for file in kb/communities/*.yaml; do
echo "\nValidating terms in $file..."
uv run linkml-term-validator validate-data "$file" -s src/communitymech/schema/communitymech.yaml --labels
done
# Validate schema-level enum meanings
validate-schema-terms:
uv run linkml-term-validator validate-schema src/communitymech/schema/communitymech.yaml
4. Fix the broken __init__.py import
Remove the TermValidator import from src/communitymech/validators/__init__.py since we're using the external tool instead of a custom module.
5. Add to QC pipeline
Update the qc target to include term validation:
qc: validate-all validate-terms-all lint test
Context
We just found 5 completely wrong NCBITaxon IDs and 3 label mismatches in EcoFAB_Ring_Trial_SynCom17.yaml through manual OLS verification. Examples:
NCBITaxon:69459 was "Dicraspidia" (a plant), not Lysobacter
NCBITaxon:164543 was "Opisthoteuthis massyae" (an octopus), not Marmoricola
Automated term validation would have caught all of these.
References
Problem
The validation pipeline references a
communitymech.validators.term_validatormodule that does not exist (src/communitymech/validators/term_validator.pyis missing), while__init__.pytries to import it. Thejust validate-termstarget fails immediately.Meanwhile, the plan docs and README already mention
linkml-term-validatoras the intended tool — we should use it directly rather than writing a custom Python wrapper.What needs to happen
1. Add
linkml-term-validatoras a dependency2. Add schema-level bindings or dynamic enums
linkml-term-validator validate-datavalidates against dynamic enums and bindings — LinkML features that constrain which ontology terms are valid in a given slot. The current schema uses a genericTermclass with plain stringid/labelfields, so the validator has nothing to check against.Options (not mutually exclusive):
Option A: Bindings on descriptor slots — bind each descriptor's
term.idto a prefix-constrained enum:Option B: Dynamic enums with
reachable_from— if we want to constrain to specific branches:Option C: At minimum,
id_prefixes— addid_prefixestoTermsubclasses to at least validate the CURIE prefix is correct (e.g. NCBITaxon for taxa, CHEBI for metabolites). This is simpler but only checks prefix, not term existence/labels.3. Update justfile targets
Replace the broken custom module calls with direct
linkml-term-validatorCLI:4. Fix the broken
__init__.pyimportRemove the
TermValidatorimport fromsrc/communitymech/validators/__init__.pysince we're using the external tool instead of a custom module.5. Add to QC pipeline
Update the
qctarget to include term validation:Context
We just found 5 completely wrong NCBITaxon IDs and 3 label mismatches in
EcoFAB_Ring_Trial_SynCom17.yamlthrough manual OLS verification. Examples:NCBITaxon:69459was "Dicraspidia" (a plant), not LysobacterNCBITaxon:164543was "Opisthoteuthis massyae" (an octopus), not MarmoricolaAutomated term validation would have caught all of these.
References