Skip to content

Conversation

@bwalsh
Copy link

@bwalsh bwalsh commented Jun 24, 2025

Description


🔧 Specific Changes: writer.py

  • New Relationship Fields Added to Avro Schema:

    • "label"

      • Type: ["null", "string"]
      • Default: None
      • Description: A semantic label for the link (e.g., "requestor", "owner")
    • "properties"

      • Type: ["null", {"type": "map", "values": "string"}]
      • Default: None
      • Description: Arbitrary metadata about the relationship

📦 Implications:

  • Links between entities in PFB files now support:

    • A human-readable label to categorize the link.
    • A dictionary of additional properties (key-value pairs) describing link-specific metadata.
  • This expands the expressiveness of PFB for graph-based data models, supporting richer relationship semantics.


🔧 Specific Changes: importers/json.py

  • Stricter JSON Record Validation:

    • New pre-condition added to convert_json():

      if "submitter_id" not in json_record and "code" not in json_record:
          raise ValueError("JSON record is missing submitter_id or code: {}".format(json_record))
    • Ensures that all JSON records contain at least one of:

      • "submitter_id"
      • "code"
    • Prevents generation of incomplete or invalid PFB nodes.


  • Enriched Relationship Metadata:

    • When building links in convert_json(), two new fields are populated:

      • "label": The original JSON property name (e.g., "knows", "colleagues").
      • "properties": Any nested "properties" dictionary provided within the link object.

    Example:

    {
      "knows": {
        "submitter_id": "person_2",
        "properties": {
          "since": "2020-01-01"
        }
      }
    }

    Results in a PFB link with:

    • label = "knows"
    • properties = {"since": "2020-01-01"}

📦 Implications:

  • Increases the semantic richness of relationships in PFB files.
  • Aligns with corresponding Avro schema changes in writer.py.
  • Prevents invalid nodes from being processed, improving overall data integrity.

Summary for Release Notes:

  • Schema Enhancement: Extended the PFB Avro schema to support label and properties fields on relationships, enabling richer and semantically meaningful graph links.

  • JSON Import Enhancements:

    • Enforced presence of submitter_id or code on all JSON records.
    • Improved relationship generation with support for label and properties fields, enabling richer, metadata-enhanced links.

Expanded Documentation: label and properties Fields in a PFB Link

The label and properties fields are recent schema enhancements to the PFB format that enrich graph relationships with semantic meaning and optional metadata.


🔗 What is a Link?

In PFB, a link represents a relationship or edge between two nodes (entities) in the graph.

Minimum structure of a link:

{
  "dst_id": "target_node_id",
  "dst_name": "target_node_type",
  "label": "relationship_label",
  "properties": {
    "property_key": "property_value"
  }
}

🏷️ label Field

Aspect Description
Purpose Provides a semantic name or type for the relationship.
Type string or null
Examples "knows", "colleagues", "owner", "requestor"
Usage Helps distinguish different types of links between the same node types.
Source Derived from the JSON property name in the source data.

Example:

In JSON input:

"knows": {
  "submitter_id": "person_bob"
}

Results in:

"label": "knows"

🗂️ properties Field

Aspect Description
Purpose Stores arbitrary key-value metadata about the relationship.
Type map<string, string> or null
Examples { "since": "2020-01-01" }, { "workplace": "acme_corp" }
Usage Adds context to the link beyond just its type, enabling richer graph models.
Source Derived from the "properties" key nested within the link in JSON input.

Example:

In JSON input:

"knows": {
  "submitter_id": "person_bob",
  "properties": {
    "since": "2020-01-01"
  }
}

Results in:

"properties": {
  "since": "2020-01-01"
}

✅ Why These Fields Matter

  • Improved Interoperability: More closely aligns PFB with general-purpose graph models (e.g., RDF, Property Graphs).
  • Enhanced Querying: Downstream tools can filter, group, or style relationships based on label and properties.
  • Backward Compatible: Both fields are optional and default to null for legacy data.

📦 Summary for Users

  • Include label when multiple relationship types exist between the same node types.
  • Use properties to attach relevant metadata to links.
  • Supported seamlessly in PFB schema and tooling after PR feature/multigraph #142.

New Features

See #134

Breaking Changes

None

Bug Fixes

None

Improvements

Dependency updates

Deployment changes

None

Tests ✅ tests/test_foaf.py

  • These FOAF tests cover:

    • Schema generation integrity.

    • Round-trip export of FOAF data.

    • Correct handling of links and link properties.

    • Providing JSON with and without "submitter_id" to confirm validation works.

    • Including links with labels and properties to validate correct PFB output.


🔬 Detailed Breakdown:

Fixtures:
  • gen3_schema_path: FOAF schema JSON file for Gen3 model.
  • avro_data_path: Output PFB Avro file for data.
  • avro_schema_path: Output PFB Avro file for schema.

Helper Functions:
  • _create_schema():

    • Removes previous output (if any).
    • Runs pfb from dict to convert FOAF schema to Avro format.
    • Asserts successful exit.
  • _test_schema():

    • Asserts generated Avro schema contains a "person" node.
  • _assert_links():

    • Validates link structure within a "person" record.
    • Checks destination names and link labels.
    • If check_link_properties is True, verifies presence and correctness of link properties like "since" or "workplace".

Test Functions:
  1. test_gen3_schema()

    • Runs schema generation.
    • Opens Avro file, asserts schema validity and exactly 1 record exists.
  2. test_foaf_data_no_links()

    • Runs round-trip export with FOAF data lacking links.
    • Asserts PFB output exists and contains exactly 2 "person" records.
    • Validates structure of each record.
  3. test_foaf_data_links()

    • Runs round-trip export with FOAF data containing links.
    • Asserts output and validates link structure using _assert_links().
  4. test_foaf_data_links_and_properties()

    • Similar to previous test but includes link properties like "since" and "workplace".
    • Validates both link structure and associated properties.

@matthewpeterkort
Copy link

I would consider expanding the properties type from "null", {"type": "map", "values": "string"}] to a union that supports values that include all of the primitive types and not just "string".

@bwalsh
Copy link
Author

bwalsh commented Jun 24, 2025

I would consider expanding the properties type from "null", {"type": "map", "values": "string"}] to a union that supports values that include all of the primitive types and not just "string".

Good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants