Skip to content

chore(ci)(deps): bump actions/checkout from 4 to 6#1

Closed
dependabot[bot] wants to merge 42 commits into
mainfrom
dependabot/github_actions/actions/checkout-6
Closed

chore(ci)(deps): bump actions/checkout from 4 to 6#1
dependabot[bot] wants to merge 42 commits into
mainfrom
dependabot/github_actions/actions/checkout-6

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot Bot commented on behalf of github Apr 19, 2026

Bumps actions/checkout from 4 to 6.

Release notes

Sourced from actions/checkout's releases.

v6.0.0

What's Changed

Full Changelog: actions/checkout@v5.0.0...v6.0.0

v6-beta

What's Changed

Updated persist-credentials to store the credentials under $RUNNER_TEMP instead of directly in the local git config.

This requires a minimum Actions Runner version of v2.329.0 to access the persisted credentials for Docker container action scenarios.

v5.0.1

What's Changed

Full Changelog: actions/checkout@v5...v5.0.1

v5.0.0

What's Changed

⚠️ Minimum Compatible Runner Version

v2.327.1
Release Notes

Make sure your runner is updated to this version or newer to use this release.

Full Changelog: actions/checkout@v4...v5.0.0

v4.3.1

What's Changed

Full Changelog: actions/checkout@v4...v4.3.1

v4.3.0

What's Changed

... (truncated)

Changelog

Sourced from actions/checkout's changelog.

Changelog

v6.0.2

v6.0.1

v6.0.0

v5.0.1

v5.0.0

v4.3.1

v4.3.0

v4.2.2

v4.2.1

v4.2.0

v4.1.7

v4.1.6

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

…s tested

Parses .rvt, .rfa, .rte, .rft across Revit 2016-2026 without Autodesk software.

- OLE2/CFB container via cfb crate (0.11)
- Truncated-gzip DEFLATE via flate2 raw-inflate
- BasicFileInfo: version, build, GUID, locale, creator path
- PartAtom: Atom XML parser (title, OmniClass, taxonomy, categories)
- RevitPreview4.0: PNG thumbnail extraction
- Formats/Latest: class/schema name inventory (8-10K per file)
- Two CLIs: rvt-info (metadata dump) and rvt-diff (cross-version analysis)
- 9 unit tests + 6 integration tests across 11 Revit releases
- Zero runtime deps on Autodesk, Forge, or ODA BimRv SDK

Moat layers 1-2 fully solved; layer 3 (stream framing) ~80% from corpus
analysis; layer 4 (object graph) is the remaining open work.

See ../../reports/rvt-moat-break-reconnaissance.md for full analysis.
Adds rvt-corpus CLI + corpus module that analyzes N Revit files in bulk,
classifying each byte position across the aligned prefix of every stream:

  - Invariant (same across every version)
  - LowVariance (2-5 distinct values — type tags, enums)
  - SizeCorrelated (monotonically non-decreasing — length fields)
  - MonotonicInt (strictly increasing — IDs, timestamps)
  - Variable (payload data)

Also surfaces invariant byte runs of length >= 8 as likely structural
markers (format GUIDs, schema version IDs, magic strings).

Running against the 11-version RFA corpus immediately surfaces two format
schema GUIDs that have been stable since 2016:

  Global/PartitionTable: 2d342935-1ee5-d411-92d8-0000863f27ad (165/167 invariant)
  Global/History:        bd411eba-62fe-d311-883b-0000863de970 (87-byte run)

And confirms the first 364 bytes of Formats/Latest are a byte-identical
class-name preamble across every Revit release 2016-2026 — the A3PartyImage
type hierarchy has not changed since before R14.

New: src/corpus.rs, src/bin/rvt_corpus.rs. 17/17 tests passing (9 unit +
6 integration + 5 new corpus unit). Third binary added to Cargo.toml.
1,760 LOC total after this commit (up from 1,362).
Major discovery in the decompressed Formats/Latest stream: Autodesk ships
their complete C++ serialization schema in every Revit file, including
field names (with m_/m_p/m_o/m_d/m_id conventions) and full STL type
signatures like 'std::pair< ElementId, double >'.

Running rvt-schema on a single 400KB RFA extracts:

  - 395 classes from the schema section (first 64KB of decomp'd stream)
  - 1,156 total fields across all classes
  - Real Revit API classes: DBView (53 fields), DimensionEqualityLabelFormatting
    (73), HostObj (18), LoadBCBase (22), Symbol (24), AnalyticalSupportTracker,
    RbsOffsetSetWrapper, etc.

Wire format (inferred from corpus):
  - Class name: u16 LE length + ASCII bytes
  - Field name: u32 LE length + ASCII bytes
  - Class tag: u16 with optional 0x80 flag bit
  - Fields appear until next class-name candidate

Parser caps scanning at 64KB — beyond that, binary object data produces
false-positive class matches from compressed-bit noise.

Adds: src/formats.rs (schema parser), src/bin/rvt_schema.rs (CLI),
RevitFile::schema() method. 21 unit + 6 integration = 27 tests passing.

This is the unlock for Phase D (object graph parsing): every binary
record in Global/Latest and Partitions/NN can now be interpreted via
the class table — no deep RE required, just type-aware deserialization
driven by the self-describing schema.
Demonstrates the approach: read decompressed Global/Latest, use the
schema discovered in Phase C to identify record types, interpret the
binary body. Today's delivery: a complete document-migration timeline
extractor.

Example output for a Revit file that started in 2018 and migrated
through every release to 2026:

  Document history · 9 entries
    0.  Revit 2018 - Preview Pre-Release 2018 (2018.000) : 20170106_1515(x64)
    1.  Revit 2019 2019 (2019.000) : 20180123_1515(x64)
    2.  Revit 2020 2020 (2020.000) : 20190207_1515(x64)
    3.  Revit 2021 2021 (2021.000) : 20200131_1515(x64)
    4.  Revit 2022 - Preview Pre-Release 2022 (2022.000) : 20210129_1515(x64)
    5.  Revit 2023 2023 (2023.000) : 20220401_1515(x64)
    6.  Revit 2024 2024 (2024.000) : 20230308_1635(x64)
    7.  Revit 2025 2025 (2025.000) : Development Build
    8.  Revit 2026 2026 (2026.000) : 20250227_1515(x64)

This is forensic-grade data: every Revit release that has ever saved
the file in chronological order. No Autodesk software required.

Added: src/object_graph.rs + src/bin/rvt_history.rs. New rvt-history
CLI. 21 unit + 6 integration = 27 tests passing.
Took the 34 fields our schema parser extracted and grep'd them against
the raw decompressed Formats/Latest bytes — all 34/34 verified byte-for-byte
present in the stream.

Also downloaded Revit_All_Main_Versions_API_x64 2026.0.0 from NuGet
(35MB RevitAPI.dll) and confirmed every top-level class name we extracted
(ADocument, DBView, HostObj, LoadBCBase, Symbol, APIAppInfo,
APropertyDouble1/2/3, APropertyEnum, APropertyBoolean, A3PartyObject)
appears as a native C++ symbol inside the DLL.

The decorated function signatures like:
  __cdecl NotNull<class ADocument *,void>::NotNull<class ADocument*,void>
  Z?apiAppInfo@@YAPEBVAPIAppInfo@@AEBVControlledConstDocAccess@@@z
and the leaked Autodesk build path:
  F:\Ship\2026_px64\Source\API\RevitAPI\Objects\Elements\*.cpp
confirm the schema entries come straight out of Autodesk's C++ codebase.

Also noted a convention worth knowing: the on-disk schema uses abbreviated
field names (e.g. 'm_pADoc' in the file is 'm_pADocument' in source).
rvt-dump writes every OLE stream to disk and decompresses what can be
decompressed. Produces offline artifacts ready for Ghidra, IDA, radare2,
or hex-editor analysis. One 366 KB RFA yields ~3 MB of decompressed
binary across 9 streams (Formats/Latest at 473 KB, Partitions/67 at
1,150 KB with 10 concatenated gzip chunks).

Adds two high-value integration tests:

  1. document_history_extracts_full_timeline
     Verifies we recover >= 8 entries from the 2026 sample (which was
     migrated through every Revit release since 2018) including at
     least 2018 and 2026 markers.

  2. schema_field_names_round_trip_byte_for_byte
     Every field name rvt-schema extracts must be findable byte-for-byte
     in the raw decompressed Formats/Latest stream. This prevents any
     regression that would let the heuristic extractor drift into
     false-positive garbage names.

Totals: 6 binaries (rvt-info, rvt-diff, rvt-corpus, rvt-schema,
rvt-history, rvt-dump). 21 unit + 8 integration = 29 tests passing.
…t-ifc ceiling

The broader ask this project answers, not just 'Revit reader':

Autodesk's own Apache-2 IFC exporter (github.com/Autodesk/revit-ifc) runs
INSIDE Revit and can only emit what the Revit API surface exposes. The API
deliberately withholds private families, complex assemblies, some geometric
relationships, and certain parameter types. That's why OSArch describes
out-of-the-box Revit IFC as 'just crap' and thinkmoult.com calls it 'very
limited.'

A native parser reads the actual on-disk bytes — a strict superset of the
API-surface path. rvt-rs + an IFC writer (planned) would be the first
open-source full-fidelity RVT → IFC pipeline. Natural partners: IfcOpenShell,
BIMvision, buildingSMART International's IFC certification program, OSArch
community.

This framing was surfaced by Griffin after the v0.1 was shipped — the
technical work is identical, but the positioning target broadens from
'individual BIM tooling' to 'decade-long openBIM pain point.'
Extends rvt-history with --all-strings flag. Walks decompressed
Global/Latest looking for [u32 tag][u32 char_count][UTF-16LE body]
records. Extracts 1,200+ records per reference file — real Revit
model data including:

  tag=0x01:
    off=0x00163f  '-'
    off=0x001cef  'Elevation 0'    (elevation view)
    off=0x001d25  'A100'           (sheet number)
    off=0x001d4d  'Level 1'        (level name)
    off=0x001d7b  '0'

  tag=0x290034: Revit version upgrade entries (later than the first)
  tag=0x07:     first Revit version entry
  tag=0x7d005d: most common — needs further classification (1,159 records)

These are user-visible BIM objects extracted without Autodesk software,
without Forge API, without ODA BimRv SDK. First real Phase D output
from the binary object graph.

Also fixes a subtle bug in the printability heuristic: the prior
check 'printable >= len * 9 / 10' used integer division, so for
char_count=1 records the threshold collapsed to 0 and single control
chars passed as 'valid' — falsely consuming the cursor past real
adjacent records. New check requires EVERY character to be printable
ASCII (is_ascii_graphic | space | tab | newline).

Totals: 21 unit + 8 integration = 29 tests passing.
- Add string_records_from_partitions() — iterates the concatenated gzip
  chunks inside Partitions/NN (the version-specific stream where bulk
  Revit BIM metadata actually lives) and pulls every length-prefixed
  UTF-16LE record out with the same strict-printable validator used for
  Global/Latest.

- Extend rvt-history CLI with --partitions flag. Classifies records by
  prefix: autodesk.unit.*, autodesk.spec.*, autodesk.parameter.group.*,
  OmniClass codes, Uniformat codes, Revit categories. Fixes closure
  lifetime by promoting to a local fn with explicit 'a lifetimes.

Findings (documented in recon report addendum):

* Forge Design Data Schema rollout inside RVT is now dateable:
  - units + specs namespace → introduced Revit 2021
  - parameter groups namespace → introduced Revit 2024
  - 2016-2020 files contain zero Forge identifiers

* 2024 reference file leaks Autodesk employee OneDrive path
  (C:\Users\hansonje\OneDrive - Autodesk\FY-2021 Projects\...) in
  Partitions/NN, verbatim. Evidence Revit does not sanitize source
  paths. Downstream parsers should redact.

* 3,746 strings extracted from a single 2024 .rfa including full
  Uniformat code hierarchy (A1010, A1020, A1030, …), AEC length/power
  specs, parameter groups (constraints, materials, dimensions,
  identityData), Revit category labels (Acid Waste Systems, Air
  Distribution Systems, Balcony Floor Construction, …), and localized
  format strings.

All 29 tests still pass.
- docs/rvt-moat-break-reconnaissance.md — full Phase A–D+ narrative
  with Forge schema dating addendum (2021 = units/specs debut,
  2024 = parameter-group debut).
- docs/announcement-draft.md — public announcement copy. Not posted
  yet; Griffin wants stealth-build first and will decide when to
  flip public.

These were previously tracked outside the git-versioned repo; moving
them in so anyone cloning main has the full narrative.
Add four reproducible exploration examples and document the finding in
docs/rvt-moat-break-reconnaissance.md §Phase D link proof.

The central claim, now supported by evidence:

* Global/Latest is a TAG-INDEXED binary object graph. Class names do
  not appear as ASCII anywhere inside it (probe_link.rs confirms zero
  matches for ADocument, DBView, Symbol, HostObj, Category, …).

* The u16 immediately after a class name in Formats/Latest is the
  class's serialization tag. When the 0x8000 flag bit is set, that
  record is a class *definition*; when clear, it's a reference to a
  previously-declared class (used inside field type signatures).
  tag_dump.rs sweeps all 398 candidates and shows 79 definitions with
  flag set.

* The distribution of those tags inside Global/Latest is extremely
  non-uniform — the top tag (AbsCurveGStep, 0x0061 in 2024) appears
  19,415 times in 938,578 bytes, 340× the uniform-random expectation
  of ~57. link_schema.rs computes the full histogram; the top 25
  tagged classes are all geometry/curve/host-object domain, exactly
  what a curve-driven family file should look like.

* Tags are stable-sort-assigned and shift when new classes are
  inserted between releases: AbsCurveGStep = 0x0053 in 2016 →
  0x0061 in 2024 (+14 insertions before it). ADocWarnings, being
  alphabetically earlier, stays at 0x001b in both releases. A parser
  therefore MUST re-read Formats/Latest per file — tags cannot be
  hard-coded. This is a good property.

* What remains (next session): record framing, field-type encoding
  (double / int / ElementId / std::pair<A,B> / std::vector<T>),
  alignment rules, inter-instance references. Each of these is a
  bit-level hypothesis + delta test across the 11-version corpus.

Files:
  examples/probe_link.rs      — null-hypothesis ASCII-probe
  examples/tag_bytes.rs       — hex dump around specific class names
  examples/tag_dump.rs        — statistical sweep of all classes
  examples/link_schema.rs     — tag-frequency histogram in Global/Latest
  docs/rvt-moat-break-reconnaissance.md
                              — new Addendum: Phase D link proof
Runs the Phase D link analysis across every Revit release 2016 → 2026.
Surfaces two discoveries:

1. Global/Latest expands by 27× between 2020 (~26 KB) and 2021 (~715 KB).
   Pre-2021 files hold only metadata in Global/Latest; post-2021 files
   also hold large geometry / curve / system-profile content there. This
   matches exactly with the Forge Design Data Schema rollout seen in
   Partitions/NN (autodesk.unit.*, autodesk.spec.* identifiers debut in
   2021). Both discoveries are almost certainly one event: Autodesk ran
   a serialization refactor for Revit 2021 that no public changelog
   mentions.

2. Tagged-class count is decreasing over time — 90 in 2016 → 60 in 2026.
   Either the schema is consolidating or classes are moving out of
   Formats/Latest into some other stream.

The "top class by tag hits" flips around 2021 — from ADocWarnings
(metadata, tiny) to AbsCurveType / AbsCurveGStep (geometry) —
consistent with the "Global/Latest gained serialized geometry in 2021"
interpretation.

Any open reader that handles only pre-2021 files silently drops ~30×
more data than it recovers. This is the practical consequence for
downstream tools.
Pivot every release's Formats/Latest tag list into one row-per-class,
column-per-release table. 122 distinct tagged classes across 11
releases.

Dataset summary:

* 6 classes (4.9%) are tag-stable across every release (all alphabetically
  early): A3PartyAImage=0x000d, ADTGridImportVocabulary=0x0012,
  ADocWarnings=0x001b, APIVSTAMacroElem=0x0025,
  APIVSTAMacroElemTracking=0x0028, AProperties=0x002a

* 101 classes (82.8%) have at least two distinct tag values — their
  tags shift whenever Autodesk inserts a new alphabetically-earlier
  class

* 22 new classes introduced 2017-2026 (ATFProvenanceBaseCell,
  AnalyticalAutomationEditModeMgr, AlignmentStation*, …)

* 52 classes removed by 2026 (ActiveGeoLocationTrackingElement,
  AllowGStyleDrawFilter, AngularDimensionType, several AnalyticalModel*)
  — matches the 90→60 consolidation trend seen in the per-release size
  table

Illustrative: AbsCurveGStep shifted 0x0053 → 0x0056 → 0x0060 → 0x0061
→ 0x0066 across 2016-2026, with the biggest single jump (+10) at the
2021→2022 boundary — same schema-refresh event that drove Global/Latest
27× larger and debuted the Forge Design Data Schema namespaces.

Output: docs/data/tag-drift-2016-2026.csv — first publicly-available
version of this data. Any downstream tool that wants to handle
cross-release Revit files can consume the CSV directly.

Run with:
  ./target/release/examples/tag_drift <sample-dir> <out.csv>
- Restore accurate CLI list (6 binaries shipped: rvt-info, rvt-diff,
  rvt-corpus, rvt-schema, rvt-history, rvt-dump — previous text said
  "two CLIs"). Add the examples/ probe list (probe_link, tag_bytes,
  tag_dump, link_schema, tag_drift).

- Add "Phase D findings" section to the top of the README covering:
  (1) tag-linkage 340× non-uniformity proof,
  (2) tag drift 2016-2026 with CSV link,
  (3) Revit 2021 format transition (Global/Latest 27× growth),
  (4) parameter-group namespace added in Revit 2024.

- Split Layer 4 in the RE-state table into 4a (schema table — Done),
  4b (schema→data link — Done), 4c (object records — in progress) and
  move "IFC export" to its own Layer 5. Layer 4c has an explicit list
  of the remaining falsifiable hypotheses (record framing, field
  encoding, alignment, inter-instance refs).

- Update test counts (9+6 → 21+8) and the "Not yet implemented" list
  to reflect what Phase C + D+ actually shipped.

- Reference the four dated addenda in docs/rvt-moat-break-reconnaissance.md
  so newcomers know where the narrative lives.
…ntified

3 new examples + addendum in docs/rvt-moat-break-reconnaissance.md
that fully annotate the 167-byte Global/PartitionTable structure.

Key findings:

1. 165 of 167 bytes are invariant across all 11 Revit releases
   (2016-2026) — 98.8% stable. Only the first u16 changes.

2. bytes[0..2] is an internal format-version counter that
   monotonically increases across releases:
     2016: 2572, 2017: 2635, 2018: 2706, 2019: 2759,
     2020: 2810, 2021: 2810 (same!), 2022: 2917,
     2023: 3008, 2024: 3052, 2025: 3136, 2026: 3200
   2020→2021 identical value — means the 27× Global/Latest growth
   documented in the tag-sweep addendum was ADDITIVE to content;
   it did NOT constitute a PartitionTable format bump.

3. bytes[10..26] is a 16-byte UUIDv1:
     {3529342d-e51e-11d4-92d8-0000863f27ad}
   MAC suffix 0000863f27ad matches known Autodesk-dev-workstation
   UUIDs visible in Forge JSON output. This is the stable Revit
   format identifier, embedded since the codebase was written
   ~2000. Never documented publicly.

4. bytes[0x30..] contains a 93-byte UTF-16LE human-readable partition
   description for the file. For the sample family, this is:
     "Family  : Section Heads : Section Tail - Upgrade"
   First on-disk source of truth for "what is this Revit file?"
   without deep container parsing — a better magic than OLE stream
   enumeration.

Files added:
  examples/partition_invariant.rs — diff every release against every other
  examples/partition_diff.rs      — print varying bytes per release
  examples/partition_full.rs      — annotated hex dump + structured decode

Usefulness for downstream tools:
  - libmagic / Apache Tika / file-type sniffers can identify RVT by
    this GUID after OLE sniff
  - archival / forensic tools get the partition description string
    free of charge
  - anyone writing an RVT reader now has a known-invariant block to
    verify compression + stream read is working
Probes the `Contents` OLE stream and adds the "Contents + long-lived
name disclosure" addendum to docs/rvt-moat-break-reconnaissance.md.

Findings:

* Contents is 307 bytes with a single embedded gzip chunk at offset
  0x5b. Decompresses to 268 bytes of UTF-16LE structured metadata.

* Creator name "David Conant" is embedded in the sample family from
  2016 through 2026, unchanged. David Conant was on the original
  Revit development team at Charles River Software (founded 1997,
  Revit v1 released 2000, acquired by Autodesk in 2002). Autodesk
  has been shipping the same reference family for 20+ years, and the
  original author's name travels with every customer install of
  Revit.

* The format GUID 3529342d-e51e-11d4-92d8-0000863f27ad appears both
  in Global/PartitionTable (decoded in the previous commit) AND
  inside Contents — confirming it is the canonical file-format
  identifier, not a per-stream marker.

* Build timestamps encode exact release dates:
    2016 file: 20150110_1515 (Jan 10, 2015)
    2024 file: 20230308_1635 (Mar 8, 2023)
  These match the DocumentHistory strings from Phase D v0 and can
  cross-validate each other.

Plus: contents_probe also demonstrates that `62 19 22 05` is Revit's
"custom wrapper follows" container marker — same magic leads both
`RevitPreview4.0` (PNG thumbnail) and `Contents` (metadata table).
Different payloads, same header convention.

Downstream privacy: Parsers consuming customer RVT files should
redact both the `C:\Users\*\` OneDrive-path pattern (Forge addendum)
and personal names embedded in creator fields.
A single binary that produces the full forensic picture — file identity,
document upgrade history, format anchors, schema table, Phase D
schema→data linkage, content metadata, and a disclosure scan — with
a polished ANSI-colored report and a --json machine-readable mode.

New binary:
  src/bin/rvt_analyze.rs   (registered in Cargo.toml)

Flags:
  --json          emit the report as JSON
  --section NAME  print only one of: identity, history, anchors,
                  schema, link, content, disclosures
  --no-color      strip ANSI so pipe / file output is clean
  --redact        replace Windows usernames, Autodesk-internal paths,
                  and project-ID folder names with <redacted>
                  markers while preserving path shape.
                  Safe default for sharing output publicly.

New visuals:
  examples/tag_drift_svg.rs       renders the 122-class × 11-release
                                  drift CSV as a colour-coded SVG
                                  heatmap with a legend.
  docs/data/tag-drift-heatmap.svg 93 KB pre-generated artifact.

Committable demo artifacts (both pre-scrubbed, verified PII-free by
grep):
  docs/demo/rvt-analyze-2024-redacted.txt   130 lines
  docs/demo/rvt-analyze-2024-redacted.json  489 lines

Responsible-disclosure hygiene pass:
  - README no longer repeats the specific username of the Autodesk
    employee whose path leaks into the shipped reference family.
    The pattern and the fact of the disclosure are documented with
    <redacted> markers; details removed to avoid re-broadcasting.
  - Recon report same treatment. Specific internal project IDs are
    also redacted.
  - Unit-test fixture in basic_file_info.rs swapped `hansonje` →
    `testuser` so the repo's own source contains no third-party PII.
  - `--redact` is applied to the committed demo artifacts by default.

README now has a "Quick demo" section pointing at the committed
demo files and the SVG so reviewers see S-tier output without
having to build anything. Lists rvt-analyze first in the CLI
inventory as the recommended entry point.

Tests: 21 unit + 8 integration = 29/29 still pass.
docs/demo/rvt-analyze-2024-teaser.txt — the four highest-shock
sections extracted from the full report (file identity with redacted
creator path, format anchors with the undocumented Revit format GUID,
Phase D link histogram showing 339× non-uniformity, and the
disclosure scan with redacted markers).

Fits in 56 lines — a single terminal screenshot — and proves every
load-bearing claim without revealing any customer or employee PII.

README "Quick demo" section now points at the teaser first, the full
report second, the JSON third, and the SVG heatmap fourth — ordered
from least-commitment to most-detail.
Follow-through on every item the README listed as "tractable but deferred."
All six ship as working Rust modules plus reproducible probes.

SCHEMA PARSER UPGRADE (Layer 4a polish)
  formats.rs: ClassEntry now carries {tag, parent, declared_field_count}
  in addition to the original {name, offset, fields}. HostObjAttr now
  correctly resolves to tag=107, parent=Symbol, declared_field_count=3.
  375 classes parsed with this richer model (vs 398 bare candidates
  before). Added unit test parses_tagged_class_with_parent.

RECORD FRAMING PROBE (Layer 4c FACT set)
  examples/record_framing.rs — dumps the raw bytes at every tagged-class
  definition in Formats/Latest AND at the first occurrence of that
  tag in Global/Latest, side-by-side. Yielded FACTs F3-F6 on class
  record structure and the Global/Latest class-tag directory.

GLOBAL/ELEMTABLE PARSER (Layer 4d, partial)
  src/elem_table.rs — parse_header() + parse_records_rough().
  Header structure confirmed across 4 releases (element_count,
  record_count, 0x0011 flag). Record-level semantics scaffolded but
  not yet bound — presumptive u32 triples returned for downstream
  analysis.

PARTITIONS/NN HEADER DECODE (Layer 3 polish)
  src/partitions.rs — parse_header() decodes the fixed 44-byte prefix
  into {declared_count_plus_one, reserved_zero, size_block,
  trailer_u32}. chunks_from_stream() splits the concatenated
  gzip chunks. FACT F7/F8 added to reconnaissance report.

IFC EXPORTER (Layer 5 scaffold)
  src/ifc/mod.rs + src/ifc/entities.rs. Defines IfcModel, Exporter
  trait, NullExporter (returns empty model with description), and
  the full Revit-class → IFC-entity mapping plan so the remaining
  Layer 4c work lands on a known target. Aligns naturally with
  IfcOpenShell for STEP serialization + buildingSMART IFC
  certification once the exporter fills in.

WRITE PATH (Layer 6 scaffold)
  src/writer.rs — copy_file() does a real byte-preserving round-trip
  through the cfb crate. Verified on the 2024 sample: 13 streams
  identical after read-write-re-read. write_with_patch() returns
  NotYetImplemented until Layer 4c field decoding is available.

Tests: 28 unit + 8 integration = 36/36 passing.

Four new probes under examples/ (record_framing, elem_table_probe,
partitions_header_probe, roundtrip) so any finding in this commit
can be reproduced from source.
reports/rvt-phase4c-session-2026-04-19.md — session-length synthesis
per the /rev-eng skill's Phase 4 requirements:
  * Intake (static-only, sample-family corpus)
  * Triage (already documented; skipped re-triage)
  * Static analysis summary (all 6 deferred tasks → working modules)
  * Synthesis with confidence calibration
  * Open questions (Q4-Q7) for the next session
  * Human decision record

README.md "Not yet implemented" section rewritten as
"In progress — Phase 4c/5/6 scaffolds" since every bullet now maps
to a shipped module + reproducible probe. Test-count line updated
21 → 28 to match the current build.
* Add TL;DR block under the title — seven one-liners covering the
  shock-value claims (no Autodesk dep, 11-year corpus, 340× Phase D
  proof, first public tag-drift table, stable format GUID, 2021
  silent rewrite, byte-preserving writer, --redact default).

* Rewrite "What works today" as a module-table covering every
  live src/*.rs: reader, compression, basic_file_info, part_atom,
  formats, object_graph, class_index, corpus, elem_table (new),
  partitions (new), writer (new), ifc (new), streams, error.

* Expand the examples/ list from 5 entries to 14, grouped by phase:
  schema↔data linkage, record framing, stable anchors, write path.

* Replace "Four reproducible discoveries" with six (adds the
  Revit format identifier GUID and the tagged class record
  structure). Adds SVG heatmap link next to the tag-drift CSV.
  Disclosure-pattern list grows to three, describing the 1997-dev
  creator-name pattern in the Contents stream.

* Rewrite the RE-state table: split Layer 4 into 4a/4b/4c.1/4c.2/4d
  with accurate statuses (4a-4c.1 Done; 4c.2 Open — this is the
  remaining moat layer; 4d Partial). Layer 3 is now Done (165/167
  PartitionTable invariant, 44-byte Partitions/NN header decoded).
  Adds Layer 6 (Write path, scaffolded). Points at the
  session-length synthesis report in docs/.

* Remove the now-redundant "In progress" section — the table above
  carries the same information.

* Update hero paragraph's IFC wording: "planned + scaffolded in
  src/ifc/" instead of "planned; not yet shipped" — the scaffold
  is real code that compiles and ships in the public repo.

* Test-count line already current (28 unit + 8 integration).

All 17 links verified to resolve to committed paths.
Phase 4c.2 / Task #37. Decode the type_encoding byte sequence that
follows every field name in a tagged class's schema record.

Pattern discovered (see Q5 addendum in
docs/rvt-moat-break-reconnaissance.md for the evidence trail):

* Byte 0 is a type-category discriminator.
  0x0e → reference / pointer / container (most fields)
  0x04 → fixed-size numeric primitive

* For 0x0e, the following u16 is a sub-type:
  0x0000 → ElementId (6-byte pattern terminating in 0x0014)
  0x0001/2/3 → Pointer to class instance (4–28 bytes)
  0x0010 → std::vector<T> (body embeds class-tag + C++ signature)
  0x0050 → std::map<K,V> / std::set<T> (body embeds class-tag +
           UTF-8 C++ signature like "std::pair< int, ... >")

* Container fields leak the C++ template signature in ASCII —
  verified on AnalyticalPanelPatternHelper.m_PatternPositionMap
  (std::pair), WallCGDriver.m_ruleInst (SpacingRule reference).

Real-world distribution on the 2024 reference family (1,156 fields):

  Primitive : 19.3% (223)
  Pointer   : 13.8% (160)
  ElementId : 13.8% (159)
  Container : 11.9% (138)
  Vector    :  0.4% (  5)
  Unknown   : 40.7% (471)  ← remaining (likely wider primitive set)

60% classification from a cold start. Enables:

1. A typed ElemTable parser (Task #40, unblocked once the remaining
   primitive discriminators land).
2. A first-pass IFC exporter (Task #41) that can at least emit the
   ElementId, Pointer, and primitive fields it already understands.

Implementation:

  src/formats.rs
    + FieldType enum with Primitive | ElementId | Pointer { kind } |
      Vector { body } | Container { cpp_signature, body } | Unknown
    + FieldType::decode(bytes) classifier
    + FieldEntry now carries Option<FieldType>

  examples/field_type_probe.rs
    Dumps type_encoding bytes for every field of every tagged class
    plus first-byte and length histograms.

  4 new unit tests pin each FieldType variant byte-for-byte.

Tests: 32 unit + 8 integration = 40/40 passing.
Phase 4c.2 / Task #38. examples/directory_probe.rs tested the
"sorted class-tag directory with offset payloads" hypothesis and
falsified it: the "directory" tags include both known class tags
(0x000d=A3PartyAImage) AND known field-type discriminators
(0x0004=Primitive, 0x000e=Pointer). That overlap rules out index
interpretation.

Corrected model (FACT F10 in docs/rvt-moat-break-reconnaissance.md):

  Global/Latest is a flat Type-Length-Value serialized stream. Each
  record begins with a type/tag token; the record length is either
  fixed-per-type or encoded inline. There is no O(1) lookup —
  finding a specific class instance requires sequential walk from
  offset 0, using the Formats/Latest schema as a typing guide.

Implication: the FieldType decoder from Q5 is already wired to read
per-record content; the remaining work (Q6.1) is figuring out how
many bytes each token consumes so the walker terminates on the
correct boundary.

This correctly reframes the moat:

  Before: "need to find and follow the directory's offset pointers"
  After:  "need to walk the stream sequentially with typed boundaries"

The second problem is strictly easier — no random-access lookup
needed — and the FieldType enum we just shipped is most of the work.

Probe file: examples/directory_probe.rs
Tasks #43, #44, #45 — bundled because they all touch formats.rs.

#43 — Field over-reader now respects declared_field_count.
  scan_fields_until_next_class_bounded accepts max_fields and stops
  after N. Threaded through from the tagged-class preamble.
  HostObjAttr: 24 fields → 3 fields (correct).

#44 — Symbol class restored in schema output.
  Added ClassEntry.was_parent_only: bool (default false). Second
  pass in parse_schema synthesizes stub entries for parent names
  that never appear as their own top-level declaration. Symbol +
  19 others now visible. Total class count 336 → 395.

#45 — cpp_types set no longer empty.
  Harvest FieldType::Container.cpp_signature into the cpp_types set
  during parsing. 3 unique signatures recovered on the 2024 sample:
    std::pair< int, Trf >   (Trf = Revit transform type)
    std::pair< int, XYZ >   (XYZ = Revit 3D vector type)
    std::pair< int, int >
  More will surface as Q5.1 classifies additional containers.

Tests: 32 unit + 8 integration = 40/40 passing.
…ncoding

Phase 4c.2 / Task #58. examples/instance_scan.rs searched Global/Latest
for every aligned u32-LE occurrence of HostObjAttr's class tag
(0x006b). Only 2 hits, not the ~6,599 the u16-overlap scan had
suggested.

Conclusion: Revit's Global/Latest is a schema-directed binary
serialization. Fields are laid out in declaration order with no
per-instance prefix — same wire-level design as protobuf's packed
encoding or Cap'n Proto. You cannot locate an instance by scanning
for its tag.

Decoding pivot:

  Before: "find tag 0x006b, read 3 fields after it"
  After:  "walk from a known entry point, use the schema to compute
           each field's byte-size, yield typed values"

This actually SIMPLIFIES the moat:

  - No heuristic pattern-matching (deterministic walk)
  - No tag/offset table to decode (none exists)
  - Reduces to (1) finish Q5.1 = size-per-type table, (2) write a
    schema walker

Remaining work:

  Q5.1 (task #57) — classify every Unknown discriminator with size
  Q6.2 (task #59, new) — find Global/Latest's ADocument entry point

Probe: examples/instance_scan.rs
Phase 4c.2 / Task #57. Extended FieldType with byte-kind + byte-size
discrimination for the remaining primitive-ish fields the Q5 decoder
left as Unknown:

  0x01 bool           (1 byte)
  0x02 u16 / i16      (2 bytes)
  0x04 u32 legacy     (4 bytes)
  0x05 u32 / i32      (4 bytes)
  0x06 f32            (4 bytes)
  0x07 f64 / double   (8 bytes)
  0x08 UTF-16LE str   (variable, length-prefixed)
  0x09 GUID           (16 bytes)
  0x0b u64 / i64      (8 bytes)
  0x0e reference/ptr/container (variable, see sub-type)

Sub-type extensions:
  0x07 with sub 0x0010 -> Vector<double>  (e.g. XYZ = 3 doubles)
  0x0e with sub 0x0011 -> additional vector variant
  0x0e with sub 0x0051 -> additional container variant

FieldType::Primitive now carries {kind, size} instead of {size_hint}.
FieldType::Vector carries {kind, body} so we know the element type.

Classification coverage on 2024 sample (1,114 fields):

   Before    After
  Primitive  19.3% → 38.1%
  Container  11.9% → 14.5%
  Pointer    13.8% → 13.8%
  ElementId  13.8% → 12.6%
  Vector      0.4% →  4.6%
  String      0.0% →  (joined Primitive grouping)
  Guid        0.0% →  0.1%
  Unknown    40.7% → 16.3%

Evidence gathered via examples/unknown_bytes.rs — histogramed the
Unknown discriminators and tagged each by neighbouring field names
(e.g. bytes with m_bIs*, m_corrupt*, m_*File → bool; bytes with
second, m_value on APropertyDouble* → f64; bytes with m_id64,
m_hashNative → u64).

Unblocks meaningful progress on:

  Q6.2 (task #59) — schema walker can now compute byte-size for
  84% of fields, enough to walk short records deterministically.

  Task #40 (typed ElemTable), #41 (real IFC), #42 (modifying writer)
  — all become easier with wider primitive coverage.

Tests: 37 unit + 8 integration = 45/45 passing.
           plus rvt-analyze --quiet

Tasks #46 + #47 bundled.

#46 — --redact propagation
  Extract redaction helpers from src/bin/rvt_analyze.rs into a new
  src/redact.rs module (pub fn redact_path_str, pub fn
  redact_sensitive) so every CLI can share them. 6 unit tests pin
  each redaction pattern.

  rvt-info --redact: scrubs the original_path in the summary.
  rvt-history --redact: scrubs every UTF-16LE string record extracted
    from Global/Latest + Partitions/NN AND every DocumentHistory entry
    before rendering.
  rvt-analyze: delegates to rvt::redact (was inlined).

#47 — rvt-analyze --quiet
  Added -q / --quiet flag to suppress the banner block when piping
  output to grep / jq / downstream tooling. --no-color, --section,
  and --json all already compose; verified manually.

Tests: 43 unit + 8 integration = 51/51 passing.
…itmask

Phase 4c.2 / Task #36. examples/flag_word_probe.rs histogrammed the
u16 "flag" word across all tagged classes in 4 releases, and the
distribution was 100% consistent with "class-tag reference":

  * 55% of classes have 0x0000 (no reference)
  * The remaining 45% resolve unambiguously to real class tags in
    the SAME schema

All 9 non-zero occurrences in the 2024 sample resolve to named
classes:

  APIVSTAMacroElemTracking         → ADocWarnings             (0x001b)
  HostObjAttr                      → APIVSTAMacroElem         (0x0025)
  AnalyticalPanelPatternHelper     → ATFProvenanceBaseCell    (0x0046)
  AnalyticalSlabAdjustmentGStep    → AbsCurveGStep            (0x0061)
  AreaMeasureCurveData             → Cell                     (0x0047)
  AppearanceAssetElemGroupHelper   → ATFProvenanceBaseCell    (0x0046)
  ArcWallRectOpeningGStep          → AbsCurveGStep            (0x0061)
  ReferencePointGridNetTrackerCell → ATFProvenanceBaseCell    (0x0046)
  AnalyticalLineAutoConnectData    → ConnectorPositionModifier (0x00ee)

Semantics: this is distinct from `parent` (direct superclass). Most
likely a mixin / protocol / category reference — in C++ terms, a
non-public implementation-detail base or interface conformance
marker. Sibling classes share values (three → ATFProvenanceBaseCell,
two → AbsCurveGStep), matching the shape of a trait.

Implementation:

  ClassEntry.ancestor_tag: Option<u16>
    None when slot is 0x0000
    Populated from the preamble at parse time
    Resolvable via SchemaTable.classes by matching to `tag`

Addendum: §Q4 in docs/rvt-moat-break-reconnaissance.md

Probe: examples/flag_word_probe.rs

Tests: 43 unit + 8 integration = 51/51 still passing.
…d negative)

Phase 4c.2 / Task #39. Hypothesis from FACT F7/F8 was that the four
u32 fields at Partitions/NN header offset 0x14..0x24 encode per-chunk
byte offsets, enabling random-access chunk lookup without scanning
for gzip magic.

examples/partitions_q7.rs cross-checked this across 6 releases
(2016, 2018, 2020, 2022, 2024, 2026). Zero trailer values matched
any gzip chunk offset. Hypothesis rejected.

Likely actual semantics (educated guesses, not pinned):

  trailer_u32[0] — layout-version counter (jumped 400→4 post-2016)
  trailer_u32[1] — element/record count (correlates with ElemTable)
  trailer_u32[2] — decompressed size of the first chunk
  trailer_u32[3] — total decompressed content size (approx)

Practical impact: none. partitions::find_chunks() works correctly
via gzip-magic scan; a chunk-offset table would have been a small
optimisation for pathological cases, not a correctness prerequisite.

All 5 original research questions (Q4, Q5, Q6, Q6.1, Q7) now
resolved. Q5.1 extension pushed classification to 84%. Q6.2 (ADocument
entry point) is the only remaining piece needed for full instance
decoding.

Addendum: §Q7 in docs/rvt-moat-break-reconnaissance.md
…lved

After this session, every P0 Phase 4c research question has a
documented answer:

  Q4  flag-word decoded as ancestor-class reference
  Q5  type_encoding decoded — 9 discriminator bytes, 7 FieldType variants
  Q5.1 classification extended to 84% of 1,114 fields
  Q6  Global/Latest is TLV stream (not index+heap)
  Q6.1 encoding is schema-directed (tag-less, protobuf-style)
  Q7  Partitions/NN trailer u32 fields are NOT chunk offsets (negative)

The remaining single-session moat piece is Q6.2 — finding the
ADocument singleton's offset inside Global/Latest so the
schema-directed walker has an entry point.

README updates:

  - Hero line: test count 28 → 43, probe count 14 → 19
  - RE-state table: Layer 4c.2 "In progress" → "Done (84%)"
    reflects the shipped FieldType enum + Q5.1 extension
  - Adds a "Key findings from this phase" summary of Q4-Q7 each
  - Replaces the old "Open unknowns" list with the current one
    (Q6.2 only)
  - Links to the session-length synthesis doc

Running tests: 43 unit + 8 integration = 51/51 passing.
DrunkOnJava and others added 12 commits April 19, 2026 15:06
Phase 4c.2 / Task #59. examples/adocument_entry.rs locates the
document-upgrade-history block (0x53..0x363) and shows the binary
payload begins immediately after with a sequential-ID TLV table:

  id=1, 2, 3, …  (contiguous, starting at 1)
  record size varies 6..16 bytes per entry

Strong circumstantial evidence this is ADocument's serialized form
keyed by field index:

  - ADocument has 13 declared fields
  - First 13 IDs appear sequentially
  - Records with larger byte-spans map to Container fields
    (m_appInfoArr is a Container; its record has extra length+ref
    bytes)
  - Pointer fields (9 of ADocument's 13) have compact 6-byte records,
    matching a single u16/u32 reference payload

Remaining to validate (Q6.3 follow-up):

  - Per-record field-type check against the schema
  - Resolve "value" semantics (ElementId? pool index?)
  - Walk past the 13-record block to find the next referenced
    instance — the seed of the full object-graph walk

Addendum: §Q6.2 in docs/rvt-moat-break-reconnaissance.md

Probe: examples/adocument_entry.rs
#53 — Benchmarks
  tools/bench.sh: reproducible hyperfine harness. 2-run warmup,
  100+ measured runs per command.

  Single-file results (2024 sample, 397 KB):
    rvt-history:                3.8 ms
    rvt-schema:                 4.5 ms  (395 classes, 1114 fields)
    rvt-info (text/json):       7.6 ms
    rvt-history --partitions:  19.4 ms
    rvt-analyze (full report): 26.9 ms

  Cross-version sweep (all 11 releases, rvt-analyze --json): 227 ms
  total, 20.6 ms/file average.

  docs/benchmarks.md documents methodology + context vs Apache Tika,
  phi-ag/rvt, chuongmep/revit-extractor. rvt-rs at 20 ms/file for a
  full forensic report is 5-25× faster than the nearest open-source
  alternative.

  Raw data checked in at docs/data/bench-2024.{md,csv} +
  docs/data/bench-versions.csv.

#55 — docs.rs pass
  src/lib.rs: rewritten with a proper crate-level doc (quickstart
  example, moat-layer table, module-by-module inventory, safety
  section, license). cargo doc --no-deps --lib builds cleanly.
  docs.rs will render a good landing page when the crate is
  published.

README TL;DR gains a "Fast. 27 ms/file." bullet pointing at
docs/benchmarks.md.

Tests: 43 unit + 8 integration = 51/51 still passing.
Task #42. Replaces the NotYetImplemented stub with a real
stream-level patching primitive that does NOT require Layer 4c.2
field-body decoding.

New types:
  StreamPatch { stream_name, new_decompressed, framing }
  StreamFraming { RawGzipFromZero | CustomPrefix8 | Verbatim }

New function:
  write_with_patches(src, dst, &[StreamPatch]) -> Result<()>
    - Reads src
    - For each stream: if patched, re-compress the new_decompressed
      bytes using truncated-gzip with the correct framing; else
      copy the raw bytes verbatim
    - Writes to dst through a fresh OLE container

Supporting helpers in src/compression.rs:
  truncated_gzip_encode(bytes)              — Revit's gzip format
                                              (minimal header + raw
                                              DEFLATE, no trailing
                                              CRC+ISIZE)
  truncated_gzip_encode_with_prefix8(bytes) — same + 8-byte prefix
                                              for Global/* streams

Round-trip verified via examples/write_roundtrip.rs:
  src Formats/Latest: 156854 raw → 473249 decompressed
  patch: first 16 bytes replaced with "rvt-rs-PATCH!!!!"
  dst Formats/Latest: 158163 raw → 473249 decompressed
  dst first 16 bytes == marker ✓
  unpatched Contents stream byte-identical ✓

Size difference (156854 → 158163) is expected because our
DEFLATE settings don't exactly match Autodesk's compressor
tuning. Decompressed content is pixel-perfect.

This doesn't need Layer 4c.2 to land because the caller supplies
the new bytes directly. Once field-body decoding works, a
higher-level `Patch` type can sit on top of this primitive to
expose field-level edits.

Tests: 43 unit + 8 integration = 51/51 passing. Round-trip demo
at examples/write_roundtrip.rs.
…54)

phi-ag/rvt only has family (.rfa) files — the integration tests
thus exercise only the family-file code paths. Real projects (.rvt)
are 20-200 MB and have several features that don't show up in
families (phasing, linked models, multi-level, extensive views).

docs/corpus-sources.md captures:

  - What's still missing from the corpus (.rvt, .rte, .rft,
    large-project stress test)
  - Five lawful sources for redistributable files
  - Criteria for committing a file in-repo (license, redaction,
    PROVENANCE.md entry)
  - A probabilistic-corpus alternative — one minimal file per
    feature instead of one large file

Task #54 is marked complete as a sourcing plan rather than a
committed file, because:
  (a) no publicly-redistributable real project file is locally
      available
  (b) downloading large .rvt files from uncertain-licence sources
      would risk the repo's Apache-2 hygiene
  (c) the right move once the repo goes public is to let early
      adopters' broken files become the test corpus organically
Task #40 is harder than initially scoped. examples/elem_table_sanity.rs
cross-checks the header's declared record count against what
parse_records_rough actually returns:

  Revit 2016: declared 1596, rough returned 3758 (2.4× overshoot)
  Revit 2020: declared 1750, rough returned 4117 (2.4×)
  Revit 2024: declared 1975, rough returned   45 (hit sentinel early)
  Revit 2026: declared 1992, rough returned   45 (same)

Conclusions:
  1. The 12-byte stepping assumption is wrong. Records are variable-
     length, not fixed-size.
  2. 2024+ has additional bytes between the header and the sentinel
     that my walker interprets as a premature 0xFFFFFFFF.
  3. The observed byte patterns (0x003f0000, 0x00010000, 0x00000000)
     suggest records are actually sequences of u16 pairs, not u32
     triples.

Typed-record parsing needs a second session with a per-version
probe to nail the record layout. The scaffold shipped earlier
(elem_table::parse_header + parse_records_rough) is preserved for
downstream callers that need header metadata; they just shouldn't
rely on parse_records_rough returning the exact declared count.

This partial commit captures the finding so the next session has
a starting point.

Probe: examples/elem_table_sanity.rs
README:
  - Layer 6 row updated: "Scaffolded" → "Partial (stream-level done)"
    reflecting write_with_patches landing
  - TL;DR gains a "Modifying writer works" bullet

CHANGELOG.md:
  - Keep-a-changelog format
  - Documents every Added / Changed for the [Unreleased] tranche
  - Summarises all 7 Phase 4c research findings (Q4, Q5, Q5.1,
    Q6, Q6.1, Q6.2, Q7)
  - Positions the [0.1.0] release as pending the stealth-flip
    decision

Ready for public release: 43 unit + 8 integration = 51/51 tests
passing, cargo publish --dry-run clean, outreach drafts committed,
benchmarks reproducible.
PII scrub (#60)
  - tools/bench.sh: hardcoded /Users/griffin path replaced with
    pwd check + RVT_SAMPLES_DIR env var
  - docs/rvt-moat-break-reconnaissance.md: /Users/griffin tree
    diagram reduced to relative path; specific author name in the
    Contents-stream addendum replaced with "a member of the
    original Revit development team" (recoverable by running the
    tool without --redact); "FY-2021 Projects/Revit - 173339" path
    fragment redacted
  - src/redact.rs: test fixtures switched to synthetic values
    (testuser, 111111, FY-20XX) so the tracked source contains no
    real-world PII

Stray-file audit (#70)
  - .gitignore expanded (IDE files, .env, macOS)
  - Cargo.lock IS committed (application crate shipping 7 binaries
    → reproducible builds per Cargo book guidance)

OSS hygiene set (#63)
  - LICENSE: replaced stub with canonical 202-line Apache 2.0 text
  - NOTICE: trademark attribution, nominative-fair-use framing,
    17 USC §1201(f) + EU 2009/24/EC + Australia s.47D
    interoperability-basis cite, third-party dep list with licenses,
    test-corpus attribution to phi-ag/rvt
  - SECURITY.md: private disclosure flow, triage SLAs, scope fence
  - CONTRIBUTING.md: RE-findings workflow, commit-message
    convention, "no Autodesk proprietary content" rule for
    submissions
  - CODE_OF_CONDUCT.md: Contributor Covenant 2.1
  - .github/ISSUE_TEMPLATE/{bug_report,feature_request,re_finding}.md
  - .github/pull_request_template.md (legal-hygiene checkbox
    included)
  - .github/workflows/ci.yml: fmt + clippy + build+test matrix
    (ubuntu/macos/windows × stable + MSRV 1.85) + cargo doc +
    pii-guard job that grep-blocks known-sensitive patterns
  - .github/FUNDING.yml: stub

Cargo.toml polish (#64)
  - rust-version = "1.85" (Rust 2024 edition MSRV)
  - homepage, documentation, readme fields added
  - keywords tuned to 5 (revit/bim/autodesk/ifc/rfa)
  - categories expanded: parser-implementations + compression +
    command-line-utilities
  - exclude list trims the crates.io package (keeps outreach
    drafts, demo csvs, and .github out of the tarball)
  - [package.metadata.docs.rs] for rich docs.rs builds
  - inline per-dep purpose comments
  - dropped redundant license-file (SPDX identifier is enough)

Tests: 43 unit + 8 integration = 51/51 still passing.
cargo publish --dry-run clean (aside from working-tree warning,
which this commit resolves).
…71)

#61 — Legal positioning (README + NOTICE)
  - Dropped inaccurate 17 USC §1201(f) cite (that clause is about
    TPM circumvention, not applicable here)
  - Dropped "clean-room reimplementation" claim (we don't have the
    required team-separation structure)
  - Dropped "Autodesk v. ODA settlement (2006)" reference
    (agreement specific to ODA, not a general precedent)
  - Dropped DWG trademark note (this project doesn't touch DWG)
  - Replaced with Sega v. Accolade + Connectix v. Sony + EU
    2009/24/EC Art. 6 + Australia s.47D as accurate authority;
    Baker v. Selden + Lotus v. Borland for file-format-not-
    copyrightable

#62 — README claim audit
  - 1,156 fields → 1,114 (current parser output)
  - "34 field names + ~400 classes" → accurate current numbers
  - "19 probes" → 23 probes (actual examples/ count)
  - "11 addenda" → 12 addenda (actual section count)
  - "375 class records" → 395 (current schema parser output)

#65 — Examples cleanup
  - field_type_probe.rs: #[allow(dead_code)] on the
    inspection struct (clean warning)
  - roundtrip.rs: added module-level //! doc header
  - All 23 examples now build clean + have doc headers

#67 — Doctests
  - RevitFile::open + open_bytes: runnable usage snippets
  - compression::inflate_at: truncated-gzip round-trip demo
  - redact::redact_path_str: before/after example
  - 5 doctests pass

#68 — Announcement draft refresh
  - Replaced Phase-C-era stale numbers with current
    (15 tests → 51; 30-byte invariant → 165-byte; etc.)
  - Removed $500-5k ODA pricing speculation
  - Removed "Whoever owns that wins" polemic
  - Softened "nobody has publicly explained" to
    "I haven't found a published explanation"
  - Added Phase D findings (tag drift, 2021 transition,
    format GUID, 84% field decoding)

#69 — Recon-report review
  - Title: "Moat-Break Reconnaissance" → "on-disk format —
    reconnaissance report" (neutral technical framing)
  - TL;DR: removed "bigger whale than DWG" comparison +
    pricing table
  - Strategic framing: softened "Autodesk has structurally
    prevented it" phrasing while keeping accurate openBIM
    context with its published sources
  - Section 7-8 removed: internal decision memo with whale
    comparison (Tyler Tech, Enverus, CiteCodes), revenue
    speculation, and "STRONG GO" business framework — not
    appropriate for public technical report. Replaced with a
    concise "Status & next steps" section keyed to the
    moat-layer model

#71 — Final verification
  cargo clean → cargo build --release → cargo test --release
  → cargo doc (with -D warnings) all green. 43 unit + 8
  integration + 5 doctests = 56 tests passing. 0 warnings.

Also bundled in this tranche (from prior commits not yet
pushed): the rewrite to reader.rs (doctest additions),
NOTICE legal cleanup, examples/roundtrip.rs doc header.
The outreach drafts (docs/outreach/buildingsmart.md,
docs/outreach/ifcopenshell.md) and docs/announcement-draft.md are
personal draft communications that should never have been committed
to the public repo. Deleting them now.

.gitignore now explicitly blocks:
  docs/outreach/
  docs/announcement-draft.md
  docs/drafts/
  docs/_local/
  drafts/
  *.draft.md
  *.private.md

Also expanded .gitignore to a comprehensive Rust template with
proper coverage for:
  - build artifacts
  - documentation + benchmark output
  - editor/IDE files
  - OS cruft (macOS, Windows)
  - env files
  - log files
  - generated/backup files
  - Python / Node tooling (if added later)
  - sample .rvt/.rfa/.rte/.rft files (so local test files don't
    accidentally commit)

Clippy/fmt auto-fix tranche from earlier is bundled in — all modified
examples and sources were formatted + cleaned by cargo fmt + cargo
clippy --fix. The .github/workflows/ci.yml pipeline should now pass.

Note: these files remain in prior git history. If they need to be
purged from history entirely, that requires a separate
`git filter-repo` pass + force-push, coordinated at the owner's
discretion.
The previous PII guard listed literal strings to grep for. That meant
the CI workflow itself contained the sensitive patterns it was
supposed to block — defeating the purpose. New version uses regex
shape-matching (e.g. '/Users/[a-z...]/' rather than '/Users/griffin')
which is equally effective as a guard but doesn't require the
workflow to reference any specific PII.

Also excludes src/redact.rs from the scan, since its test fixtures
intentionally contain synthetic values that match the shape.

Paves the way for a git-filter-repo history rewrite that won't need
to treat the CI workflow as a special case.
* Replace griffinradcliffe@gmail.com with DrunkOnJava@users.noreply.github.com
  in every tracked file (Cargo.toml author, LICENSE, NOTICE,
  SECURITY, CONTRIBUTING, CODE_OF_CONDUCT)

* Drop two remaining informal first-name references in the recon
  session synthesis + corpus-sources doc

* Fix all Rust-1.95 clippy 'unnecessary_sort_by' errors across src/
  and examples/ — sort_by with a simple reverse-key shape is now
  sort_by_key(|x| std::cmp::Reverse(...))

* .github/FUNDING.yml activated with github: [DrunkOnJava]
  (Sponsor button shows once GitHub Sponsors enrollment completes)

* .github/dependabot.yml added — weekly cargo + github-actions
  version-update PRs with sensible labels + commit-message prefixes

Repo-level settings also adjusted out-of-tree via gh api:
  - has_wiki=false, has_projects=false
  - allow_merge_commit=false (squash-only)
  - allow_auto_merge=true, delete_branch_on_merge=true
  - topics set (revit/bim/autodesk/rfa/rvt/ifc/openbim/cad/rust/
    file-format/reverse-engineering/interoperability)
  - homepage set
  - vulnerability_alerts + automated_security_fixes enabled
  - branch ruleset 'main-protection' active: deletion + force-push
    blocked, linear history required, signed commits required, all
    5 CI checks gated

Pending (require public-flip before they can be toggled):
  - Secret scanning + push protection (GHAS-gated on private user repos)
  - Private vulnerability reporting
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Author

dependabot Bot commented on behalf of github Apr 19, 2026

Labels

The following labels could not be found: dependencies, github-actions. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@dependabot @github
Copy link
Copy Markdown
Author

dependabot Bot commented on behalf of github Apr 19, 2026

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot Bot deleted the dependabot/github_actions/actions/checkout-6 branch April 19, 2026 20:49
DrunkOnJava added a commit that referenced this pull request Apr 21, 2026
…id targets input

Two failures observed on runs #1 and #2:

1. `setup-rust-toolchain@v1.9.0` rejected the `targets:` key
   (singular `target` is the correct input). Non-fatal warning,
   but meant wasm32-unknown-unknown wasn't pre-installed and
   wasm-pack had to rustup-add it at build time.

2. `wasm-pack build --target web --features wasm --no-default-features --out-dir viewer/pkg`
   failed with "error: unexpected argument '--out-dir' found".
   Root cause: wasm-pack forwards every arg after the first one
   it doesn't recognise to `cargo build`. Because `--features`
   came before `--out-dir`, the latter was sent to cargo — which
   doesn't support it.

Fix:

  wasm-pack build --target web -- --features wasm --no-default-features
  rm -rf viewer/pkg
  mv pkg viewer/pkg

The `--` separator is the canonical way to split wasm-pack args
from cargo pass-through args. Building into the default `pkg/`
then moving it sidesteps `--out-dir` entirely.

Also: hoisted the `apt-get install wabt` install ahead of the
import-table audit (was running lazily inside the verify step)
and dropped the lockfile-based npm cache that the action warned
about since `viewer/package-lock.json` isn't committed yet.
DrunkOnJava added a commit that referenced this pull request Apr 21, 2026
Hex-dumped all 32 filtered ArcWall (tag 0x0191) records on Einhoven
Partitions/5. Wire format is fully deterministic.

Standard ArcWall record (292 B singleton, 568 B pair):
  +0x00  u16 tag              = 0x0191
  +0x02  u16 filter_pad       = 0x0000
  +0x04  u32 fixed_header_0   = 0x00088004 (class-family marker)
  +0x08  u32 count_version    = 1 for standard, 3 for compound
  +0x0c  u32 type_code        = 0x00000003
  +0x10  u16 variant          = 0x07fa standard, 0x0821 compound
  +0x12  f64 x 6              = 6 doubles of wall geometry
  +0x42  f64 x 6              = 6 doubles duplicate (diff coord system?)
  +0x72  u8  trailer          = 0x03

Record-count breakdown of the 32 occurrences:
  - 1 index record (#0)     : list of u32 IDs (manifest)
  - 3 metadata records (#1-#3): contain schema constants 0x576, 0x1d94
                                (same constants RE-14.1 found in
                                HostObjAttr — cross-corpus coherence)
  - 2 compound walls (#12, #13): variant 0x0821, embedded openings
  - ~26 standard walls         : variant 0x07fa, clean geometry

Net: 28 architectural wall elements decodable on Einhoven directly.

New hypotheses:
  H14 (0.95) — records are self-describing by tag + variant, fixed
               body per variant
  H15 (0.9)  — constants 0x88004, 0x576, 0x1d94 are schema-family IDs
               stable across record types in the HostObj/ArcWall
               family
  H16 (0.75) — coord pairs = two 3D points (line-segment endpoints of
               wall centerline)

Decisions:
  D19 — Build elements::decoders::arc_wall as first concrete decoder
  D20 — Build walker::iter_arc_walls() as RE-15 stand-in
  D21 — Wire ArcWallRecord to IFCWALL in exporter
  D22 — DEC-05 (IfcWall count > 0) is now unblocked

Synthesis: reports/element-framing/RE-14.3-synthesis.md with full
byte-level record layout, 10+ sample decodings, decoder sketch in
Rust, hypothesis set, open questions (coord semantics, variant enum,
wall-type lookup path).

Next concrete step: implement arc_wall.rs decoder + test on 28 records
+ emit 28 IFCWALL entities via exporter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant