HAYAML 1.0 - Human-Readable Serialization for Complex Web Content
© 2025 blubitz. This work is licensed under CC BY 4.0.
HAYAML (Human-Readable Serialization for Complex Web Content) is a human-readable, cross-language data serialization format. The primary design purpose of HAYAML is to serialize complex web content, such as custom web components, in a way that is simultaneously efficiently machine-parsable and easily understood by humans during code review.
It is important to note that HAYAML is not a new language syntax. Instead, it defines a specific information model that is layered upon the foundational JSON object model. This abstract model is subsequently presented for storage and review using the YAML 1.2 format.
This approach offers several core benefits for developers and content managers:
- It provides superior human readability for complex, nested data structures compared to JSON.
- In version control systems like Git, it leads to cleaner and more intuitive file diffs.
- The format natively supports the embedding of multi-line HTML content.
- It guarantees a completely unambiguous, two-way mapping between the in-memory data model and its final text representation, thereby ensuring data integrity.
These benefits are direct results of the specific design goals that guided the format's development.
The creation of HAYAML was driven by a clear set of requirements intended to overcome the limitations of existing serialization formats within its specific problem domain. The architecture of HAYAML was guided by the following core objectives:
- Custom web component serialization: To provide a robust method for serializing complex, nested web components.
- A focus on being able to compare changes: To prioritize the ability for developers to easily compare file revisions and accurately identify changes.
- Human readability: To ensure the final text format is simple for humans to read and understand.
- Functional equivalence to JSON's data modeling capabilities: To match the structural power and flexibility inherent in the JSON data model.
- Improved file diffs on platforms like GitHub: To generate cleaner, more accurate change comparisons within version control systems.
- The ability to embed HTML content directly: To natively support the inclusion of multi-line HTML strings without requiring complex escaping.
- An unambiguous, two-way mapping: To guarantee that an object representation maps to one and only one text file, and that the text file can be perfectly deserialized back into the single, original object.
The subsequent sections detail the specific challenges that necessitated the creation of a format built upon these goals.
The selection of a serialization format critically impacts developer productivity, collaboration, and overall code quality, especially when dealing with nested, complex web content. While simpler formats like Markdown are suitable for basic text content, they are insufficient for this domain because they lack the ability to represent complex, nested component structures. When a format is difficult to read or produces unclear version control diffs, it creates unnecessary friction within the development process. This section specifically analyzes the shortcomings of common formats that led to the development of HAYAML.
Although HTML is the native language of the web, its verbosity makes it a poor choice for data serialization in environments where version control is critical. The main limitation lies in its excessive syntax and structural elements that introduce unnecessary noise. This additional complexity obscures meaningful data changes, making it harder for developers to focus on content-specific modifications during code reviews.
JSON was first used to serialize custom web elements because of its strengths, including the ability to represent all data types and deserialize them to their original form reliably. It offers a straightforward conversion between text and object forms and enjoys broad support across platforms and tools. However, when applied to this specific use case—serializing complex web components, JSON revealed critical failures related to readability and integration with version control.
- Readability: The major issue emerged when JSON values were used to store HTML content. Because JSON does not natively support multi-line strings, entire blocks of HTML must be compressed into a single line. This results in extremely long, tag-dense lines of code that are difficult to read and make edits nearly impossible to review effectively.
- Version Control Diffs: The single-line formatting issue has a direct and adverse impact on version control systems. In platforms like Git, diffs often fail to accurately identify specific changes. Rather than highlighting a targeted modification within the HTML, the system may treat the entire line as deleted and replaced, obscuring the actual change and complicating code review.
These substantial limitations in readability and versioning necessitated the creation of an alternative. The HAYAML information model was explicitly designed to solve these specific problems while retaining the crucial structural power of JSON.
In pursuit of its design goals, HAYAML specifies a clear, recursive data structure derived from the JSON object model. Conceptually, a HAYAML structure is fundamentally a standard JSON object that is constrained by a specific information model. This model enforces a single, powerful rule concerning a special key, which enables recursive and content-focused serialization. Understanding this model is essential for the correct implementation and utilization of the format to ensure consistent and reliable data serialization.
The foundation of a HAYAML document is a JSON object, defined as a collection of name/value pairs. The HAYAML model imposes one critical rule upon this foundation: the reservation of a special key named content.
The content key possesses a dual nature, and its value is permitted to store one of two data types:
- A string, which is explicitly defined to represent HTML content.
- A recursive object, which must itself fully conform to the HAYAML structure. This nested object may contain its own set of arbitrary keys and, optionally, its own special
contentkey.
Any other keys that appear alongside the content key function as metadata for the object. For instance, a type key might be used to indicate the type of a web component (e.g., image-gallery), while the corresponding content key holds the actual data or child components for that element.
A fundamental design principle of HAYAML is the guarantee of an unambiguous two-way mapping. This critical property ensures that a serialized object maps to one and only one text representation, and that text representation can only be deserialized back into the single, original object structure. This is a vital requirement for preventing data loss or corruption and maintaining data integrity throughout serialization and deserialization cycles.
The HAYAML Information Model achieves this deterministic property because its rules lack syntactic ambiguity. The reservation of the content key, along with the clear, hierarchical distinction between metadata keys and content, establishes a predictable structure that maps to one, and only one, object representation. This formal, unambiguous model is what allows the data to be serialized into a concrete, human-friendly text file.
The conversion of the abstract HAYAML Information Model into the final, human-readable text file is a deliberate, two-step process that strategically utilizes two established standards: JSON and YAML.
- Step 1: In-Memory Representation. The entire HAYAML data structure—including all nested objects, metadata keys, and HTML content strings—is first compiled into a single JSON object in memory. This step ensures adherence to the formal JSON data model.
- Step 2: Serialization to YAML. This complete in-memory JSON object is then converted into a YAML 1.2 text representation. This resulting YAML text file is the on-disk HAYAML file format.
The choice to use YAML as the final presentation format was strategic, as its design goals align perfectly with HAYAML's requirements, specifically its aim to be easily readable by humans. Crucially, YAML’s native support for multi-line strings directly solves the major readability and version-control diffing issues that were identified when handling embedded HTML content in JSON. By leveraging YAML for presentation, HAYAML gains the benefit of YAML's clean syntax and powerful features without needing to invent a new language syntax.
This section illustrates a concrete example of HAYAML in practice by demonstrating the serialization of a conceptual web component structure.
Consider a simple web component defined by the following data structure:
- Component Type:
image-gallery - Metadata:
gallery-title: "Summer Vacation" - Content: A child component of type
image-panel
JSON
{
"type": "image-gallery",
"gallery-title": "Summer Vacation",
"content": {
"type": "image-panel",
"content": "<div class=\"center\"><img src=\"vacation1.png\" style=\"max-width:200px\" alt=\"beach\"></div>"
}
}Following the two-step HAYAML serialization process, this structure is represented as the following YAML file (which is the final HAYAML format):
HAYAML
type: image-gallery
gallery-title: Summer Vacation
content:
type: image-panel
content:
node: root
child:
- node: element
tag: div
attr:
class: center
child:
- node: element
tag: img
attr:
src: vacation1.png
style: max-width:200px
alt: beachThis example clearly adheres to the HAYAML specification. The type and gallery-title keys function as metadata associated with the component. The special content key holds the multi-line HTML string, which is rendered in a highly readable format using YAML's dash - notation. This representation is clean, simple to review, and would generate clean diffs in a version control system even if only a single image tag were modified.
HAYAML is formally situated within the context of existing data standards. It functions as an information model specification, not a new syntax. Its successful implementation depends entirely on two stable, well-defined, and open standards, which ensures broad compatibility.
The HAYAML information model is specifically a specialization of the JSON object model. JSON is formally defined by the ECMA-404 standard. HAYAML fully adheres to the fundamental JSON principle that an object is an unordered collection of name/value pairs, adding only the singular semantic constraint that reserves the content key for its special purpose. Consequently, all HAYAML structures remain, at their core, valid JSON object structures.
The on-disk presentation format of any HAYAML file MUST be valid YAML 1.2. This requirement directly supports the primary goals of the project. The design philosophy of YAML, centered on creating a format that is easily readable by humans and capable of reflecting the native data structures of dynamic languages, makes it the ideal choice for the presentation layer for the HAYAML model. These specific properties of YAML are what solve the critical readability and diffing challenges that originally motivated the creation of HAYAML.
- Description: A npm library that converts between JSON objects and HAYAML.
- License: MIT
- Repository: @blubitz/hayaml