Add Support for Nested Objects in PFB Schema #143
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Support for Nested Objects in PFB Schema [#133]
Description
This pull request adds support for nested objects within the PFB (Portable Format for Biomedical Data) schema, resolving [Issue #133](#133).
Summary of Changes
"object"to be used within PFB entities.This enhancement improves compatibility with complex GraphQL schemas that include embedded structures, enabling more flexible and expressive metadata models within the PFB format.
Related Issue
Closes [#133](#133)
Testing
Confirmed existing unit tests pass.
Added new tests to validate:
Documentation Updates
README.md
Checklist
Additional Notes
This update maintains backward compatibility for existing PFB workflows. Consumers of the library are not required to change anything unless they wish to leverage nested object support explicitly.
Implementation details
✨ Changes to Recursive Object Property Handling (
_any_map)Introduced the
_any_map(max_depth)helper to generate Avromaptypes for fields with"additionalProperties": true.This ensures object properties that allow arbitrary key-value pairs are accurately modeled in the PFB schema output.
Provides explicit, bounded recursion for object nesting, controlled by
max_depth.Ensures downstream consumers (e.g., PFB readers) have a complete, well-defined schema for open-ended object fields.
Supports:
null,boolean,int,long,float,double,bytes,string).max_depthto prevent infinite recursion.Applied during schema generation when a property has:
{ "type": "object", "additionalProperties": true }Why
When It's Used
The function is invoked during schema generation for Gen3 dictionary fields where:
{ "type": "object", "additionalProperties": true }This allows open-ended object fields to be represented as valid, recursive Avro
maptypes in the PFB output.Inside schema generation logic:
How It Works
The
_any_map(max_depth)function generates an Avro-compatiblemaptype to represent arbitrary key-value pairs, supporting recursive nesting up to a configurable depth.Behavior
Returns an Avro
mapwhere:Keys are arbitrary strings.
Values can be:
Primitive types:
"null","boolean","int","long","float","double","bytes","string"Arrays containing:
_any_map) for further recursionNested maps (
_any_map) for recursive object structuresExample Output (
max_depth = 1){ "type": "map", "values": [ "null", "boolean", "int", "long", "float", "double", "bytes", "string", { "type": "array", "items": [ "null", "boolean", "int", "long", "float", "double", "bytes", "string", { "type": "map", "values": [ "null", "boolean", "int", "long", "float", "double", "bytes", "string" ] } ] }, { "type": "map", "values": [ "null", "boolean", "int", "long", "float", "double", "bytes", "string" ] } ] }