chore(ci)(deps): bump actions/checkout from 4 to 6#1
Closed
dependabot[bot] wants to merge 42 commits into
Closed
Conversation
…s tested Parses .rvt, .rfa, .rte, .rft across Revit 2016-2026 without Autodesk software. - OLE2/CFB container via cfb crate (0.11) - Truncated-gzip DEFLATE via flate2 raw-inflate - BasicFileInfo: version, build, GUID, locale, creator path - PartAtom: Atom XML parser (title, OmniClass, taxonomy, categories) - RevitPreview4.0: PNG thumbnail extraction - Formats/Latest: class/schema name inventory (8-10K per file) - Two CLIs: rvt-info (metadata dump) and rvt-diff (cross-version analysis) - 9 unit tests + 6 integration tests across 11 Revit releases - Zero runtime deps on Autodesk, Forge, or ODA BimRv SDK Moat layers 1-2 fully solved; layer 3 (stream framing) ~80% from corpus analysis; layer 4 (object graph) is the remaining open work. See ../../reports/rvt-moat-break-reconnaissance.md for full analysis.
Adds rvt-corpus CLI + corpus module that analyzes N Revit files in bulk, classifying each byte position across the aligned prefix of every stream: - Invariant (same across every version) - LowVariance (2-5 distinct values — type tags, enums) - SizeCorrelated (monotonically non-decreasing — length fields) - MonotonicInt (strictly increasing — IDs, timestamps) - Variable (payload data) Also surfaces invariant byte runs of length >= 8 as likely structural markers (format GUIDs, schema version IDs, magic strings). Running against the 11-version RFA corpus immediately surfaces two format schema GUIDs that have been stable since 2016: Global/PartitionTable: 2d342935-1ee5-d411-92d8-0000863f27ad (165/167 invariant) Global/History: bd411eba-62fe-d311-883b-0000863de970 (87-byte run) And confirms the first 364 bytes of Formats/Latest are a byte-identical class-name preamble across every Revit release 2016-2026 — the A3PartyImage type hierarchy has not changed since before R14. New: src/corpus.rs, src/bin/rvt_corpus.rs. 17/17 tests passing (9 unit + 6 integration + 5 new corpus unit). Third binary added to Cargo.toml. 1,760 LOC total after this commit (up from 1,362).
Major discovery in the decompressed Formats/Latest stream: Autodesk ships
their complete C++ serialization schema in every Revit file, including
field names (with m_/m_p/m_o/m_d/m_id conventions) and full STL type
signatures like 'std::pair< ElementId, double >'.
Running rvt-schema on a single 400KB RFA extracts:
- 395 classes from the schema section (first 64KB of decomp'd stream)
- 1,156 total fields across all classes
- Real Revit API classes: DBView (53 fields), DimensionEqualityLabelFormatting
(73), HostObj (18), LoadBCBase (22), Symbol (24), AnalyticalSupportTracker,
RbsOffsetSetWrapper, etc.
Wire format (inferred from corpus):
- Class name: u16 LE length + ASCII bytes
- Field name: u32 LE length + ASCII bytes
- Class tag: u16 with optional 0x80 flag bit
- Fields appear until next class-name candidate
Parser caps scanning at 64KB — beyond that, binary object data produces
false-positive class matches from compressed-bit noise.
Adds: src/formats.rs (schema parser), src/bin/rvt_schema.rs (CLI),
RevitFile::schema() method. 21 unit + 6 integration = 27 tests passing.
This is the unlock for Phase D (object graph parsing): every binary
record in Global/Latest and Partitions/NN can now be interpreted via
the class table — no deep RE required, just type-aware deserialization
driven by the self-describing schema.
Demonstrates the approach: read decompressed Global/Latest, use the
schema discovered in Phase C to identify record types, interpret the
binary body. Today's delivery: a complete document-migration timeline
extractor.
Example output for a Revit file that started in 2018 and migrated
through every release to 2026:
Document history · 9 entries
0. Revit 2018 - Preview Pre-Release 2018 (2018.000) : 20170106_1515(x64)
1. Revit 2019 2019 (2019.000) : 20180123_1515(x64)
2. Revit 2020 2020 (2020.000) : 20190207_1515(x64)
3. Revit 2021 2021 (2021.000) : 20200131_1515(x64)
4. Revit 2022 - Preview Pre-Release 2022 (2022.000) : 20210129_1515(x64)
5. Revit 2023 2023 (2023.000) : 20220401_1515(x64)
6. Revit 2024 2024 (2024.000) : 20230308_1635(x64)
7. Revit 2025 2025 (2025.000) : Development Build
8. Revit 2026 2026 (2026.000) : 20250227_1515(x64)
This is forensic-grade data: every Revit release that has ever saved
the file in chronological order. No Autodesk software required.
Added: src/object_graph.rs + src/bin/rvt_history.rs. New rvt-history
CLI. 21 unit + 6 integration = 27 tests passing.
Took the 34 fields our schema parser extracted and grep'd them against the raw decompressed Formats/Latest bytes — all 34/34 verified byte-for-byte present in the stream. Also downloaded Revit_All_Main_Versions_API_x64 2026.0.0 from NuGet (35MB RevitAPI.dll) and confirmed every top-level class name we extracted (ADocument, DBView, HostObj, LoadBCBase, Symbol, APIAppInfo, APropertyDouble1/2/3, APropertyEnum, APropertyBoolean, A3PartyObject) appears as a native C++ symbol inside the DLL. The decorated function signatures like: __cdecl NotNull<class ADocument *,void>::NotNull<class ADocument*,void> Z?apiAppInfo@@YAPEBVAPIAppInfo@@AEBVControlledConstDocAccess@@@z and the leaked Autodesk build path: F:\Ship\2026_px64\Source\API\RevitAPI\Objects\Elements\*.cpp confirm the schema entries come straight out of Autodesk's C++ codebase. Also noted a convention worth knowing: the on-disk schema uses abbreviated field names (e.g. 'm_pADoc' in the file is 'm_pADocument' in source).
rvt-dump writes every OLE stream to disk and decompresses what can be
decompressed. Produces offline artifacts ready for Ghidra, IDA, radare2,
or hex-editor analysis. One 366 KB RFA yields ~3 MB of decompressed
binary across 9 streams (Formats/Latest at 473 KB, Partitions/67 at
1,150 KB with 10 concatenated gzip chunks).
Adds two high-value integration tests:
1. document_history_extracts_full_timeline
Verifies we recover >= 8 entries from the 2026 sample (which was
migrated through every Revit release since 2018) including at
least 2018 and 2026 markers.
2. schema_field_names_round_trip_byte_for_byte
Every field name rvt-schema extracts must be findable byte-for-byte
in the raw decompressed Formats/Latest stream. This prevents any
regression that would let the heuristic extractor drift into
false-positive garbage names.
Totals: 6 binaries (rvt-info, rvt-diff, rvt-corpus, rvt-schema,
rvt-history, rvt-dump). 21 unit + 8 integration = 29 tests passing.
…t-ifc ceiling The broader ask this project answers, not just 'Revit reader': Autodesk's own Apache-2 IFC exporter (github.com/Autodesk/revit-ifc) runs INSIDE Revit and can only emit what the Revit API surface exposes. The API deliberately withholds private families, complex assemblies, some geometric relationships, and certain parameter types. That's why OSArch describes out-of-the-box Revit IFC as 'just crap' and thinkmoult.com calls it 'very limited.' A native parser reads the actual on-disk bytes — a strict superset of the API-surface path. rvt-rs + an IFC writer (planned) would be the first open-source full-fidelity RVT → IFC pipeline. Natural partners: IfcOpenShell, BIMvision, buildingSMART International's IFC certification program, OSArch community. This framing was surfaced by Griffin after the v0.1 was shipped — the technical work is identical, but the positioning target broadens from 'individual BIM tooling' to 'decade-long openBIM pain point.'
Extends rvt-history with --all-strings flag. Walks decompressed
Global/Latest looking for [u32 tag][u32 char_count][UTF-16LE body]
records. Extracts 1,200+ records per reference file — real Revit
model data including:
tag=0x01:
off=0x00163f '-'
off=0x001cef 'Elevation 0' (elevation view)
off=0x001d25 'A100' (sheet number)
off=0x001d4d 'Level 1' (level name)
off=0x001d7b '0'
tag=0x290034: Revit version upgrade entries (later than the first)
tag=0x07: first Revit version entry
tag=0x7d005d: most common — needs further classification (1,159 records)
These are user-visible BIM objects extracted without Autodesk software,
without Forge API, without ODA BimRv SDK. First real Phase D output
from the binary object graph.
Also fixes a subtle bug in the printability heuristic: the prior
check 'printable >= len * 9 / 10' used integer division, so for
char_count=1 records the threshold collapsed to 0 and single control
chars passed as 'valid' — falsely consuming the cursor past real
adjacent records. New check requires EVERY character to be printable
ASCII (is_ascii_graphic | space | tab | newline).
Totals: 21 unit + 8 integration = 29 tests passing.
- Add string_records_from_partitions() — iterates the concatenated gzip chunks inside Partitions/NN (the version-specific stream where bulk Revit BIM metadata actually lives) and pulls every length-prefixed UTF-16LE record out with the same strict-printable validator used for Global/Latest. - Extend rvt-history CLI with --partitions flag. Classifies records by prefix: autodesk.unit.*, autodesk.spec.*, autodesk.parameter.group.*, OmniClass codes, Uniformat codes, Revit categories. Fixes closure lifetime by promoting to a local fn with explicit 'a lifetimes. Findings (documented in recon report addendum): * Forge Design Data Schema rollout inside RVT is now dateable: - units + specs namespace → introduced Revit 2021 - parameter groups namespace → introduced Revit 2024 - 2016-2020 files contain zero Forge identifiers * 2024 reference file leaks Autodesk employee OneDrive path (C:\Users\hansonje\OneDrive - Autodesk\FY-2021 Projects\...) in Partitions/NN, verbatim. Evidence Revit does not sanitize source paths. Downstream parsers should redact. * 3,746 strings extracted from a single 2024 .rfa including full Uniformat code hierarchy (A1010, A1020, A1030, …), AEC length/power specs, parameter groups (constraints, materials, dimensions, identityData), Revit category labels (Acid Waste Systems, Air Distribution Systems, Balcony Floor Construction, …), and localized format strings. All 29 tests still pass.
- docs/rvt-moat-break-reconnaissance.md — full Phase A–D+ narrative with Forge schema dating addendum (2021 = units/specs debut, 2024 = parameter-group debut). - docs/announcement-draft.md — public announcement copy. Not posted yet; Griffin wants stealth-build first and will decide when to flip public. These were previously tracked outside the git-versioned repo; moving them in so anyone cloning main has the full narrative.
Add four reproducible exploration examples and document the finding in
docs/rvt-moat-break-reconnaissance.md §Phase D link proof.
The central claim, now supported by evidence:
* Global/Latest is a TAG-INDEXED binary object graph. Class names do
not appear as ASCII anywhere inside it (probe_link.rs confirms zero
matches for ADocument, DBView, Symbol, HostObj, Category, …).
* The u16 immediately after a class name in Formats/Latest is the
class's serialization tag. When the 0x8000 flag bit is set, that
record is a class *definition*; when clear, it's a reference to a
previously-declared class (used inside field type signatures).
tag_dump.rs sweeps all 398 candidates and shows 79 definitions with
flag set.
* The distribution of those tags inside Global/Latest is extremely
non-uniform — the top tag (AbsCurveGStep, 0x0061 in 2024) appears
19,415 times in 938,578 bytes, 340× the uniform-random expectation
of ~57. link_schema.rs computes the full histogram; the top 25
tagged classes are all geometry/curve/host-object domain, exactly
what a curve-driven family file should look like.
* Tags are stable-sort-assigned and shift when new classes are
inserted between releases: AbsCurveGStep = 0x0053 in 2016 →
0x0061 in 2024 (+14 insertions before it). ADocWarnings, being
alphabetically earlier, stays at 0x001b in both releases. A parser
therefore MUST re-read Formats/Latest per file — tags cannot be
hard-coded. This is a good property.
* What remains (next session): record framing, field-type encoding
(double / int / ElementId / std::pair<A,B> / std::vector<T>),
alignment rules, inter-instance references. Each of these is a
bit-level hypothesis + delta test across the 11-version corpus.
Files:
examples/probe_link.rs — null-hypothesis ASCII-probe
examples/tag_bytes.rs — hex dump around specific class names
examples/tag_dump.rs — statistical sweep of all classes
examples/link_schema.rs — tag-frequency histogram in Global/Latest
docs/rvt-moat-break-reconnaissance.md
— new Addendum: Phase D link proof
Runs the Phase D link analysis across every Revit release 2016 → 2026. Surfaces two discoveries: 1. Global/Latest expands by 27× between 2020 (~26 KB) and 2021 (~715 KB). Pre-2021 files hold only metadata in Global/Latest; post-2021 files also hold large geometry / curve / system-profile content there. This matches exactly with the Forge Design Data Schema rollout seen in Partitions/NN (autodesk.unit.*, autodesk.spec.* identifiers debut in 2021). Both discoveries are almost certainly one event: Autodesk ran a serialization refactor for Revit 2021 that no public changelog mentions. 2. Tagged-class count is decreasing over time — 90 in 2016 → 60 in 2026. Either the schema is consolidating or classes are moving out of Formats/Latest into some other stream. The "top class by tag hits" flips around 2021 — from ADocWarnings (metadata, tiny) to AbsCurveType / AbsCurveGStep (geometry) — consistent with the "Global/Latest gained serialized geometry in 2021" interpretation. Any open reader that handles only pre-2021 files silently drops ~30× more data than it recovers. This is the practical consequence for downstream tools.
Pivot every release's Formats/Latest tag list into one row-per-class, column-per-release table. 122 distinct tagged classes across 11 releases. Dataset summary: * 6 classes (4.9%) are tag-stable across every release (all alphabetically early): A3PartyAImage=0x000d, ADTGridImportVocabulary=0x0012, ADocWarnings=0x001b, APIVSTAMacroElem=0x0025, APIVSTAMacroElemTracking=0x0028, AProperties=0x002a * 101 classes (82.8%) have at least two distinct tag values — their tags shift whenever Autodesk inserts a new alphabetically-earlier class * 22 new classes introduced 2017-2026 (ATFProvenanceBaseCell, AnalyticalAutomationEditModeMgr, AlignmentStation*, …) * 52 classes removed by 2026 (ActiveGeoLocationTrackingElement, AllowGStyleDrawFilter, AngularDimensionType, several AnalyticalModel*) — matches the 90→60 consolidation trend seen in the per-release size table Illustrative: AbsCurveGStep shifted 0x0053 → 0x0056 → 0x0060 → 0x0061 → 0x0066 across 2016-2026, with the biggest single jump (+10) at the 2021→2022 boundary — same schema-refresh event that drove Global/Latest 27× larger and debuted the Forge Design Data Schema namespaces. Output: docs/data/tag-drift-2016-2026.csv — first publicly-available version of this data. Any downstream tool that wants to handle cross-release Revit files can consume the CSV directly. Run with: ./target/release/examples/tag_drift <sample-dir> <out.csv>
- Restore accurate CLI list (6 binaries shipped: rvt-info, rvt-diff, rvt-corpus, rvt-schema, rvt-history, rvt-dump — previous text said "two CLIs"). Add the examples/ probe list (probe_link, tag_bytes, tag_dump, link_schema, tag_drift). - Add "Phase D findings" section to the top of the README covering: (1) tag-linkage 340× non-uniformity proof, (2) tag drift 2016-2026 with CSV link, (3) Revit 2021 format transition (Global/Latest 27× growth), (4) parameter-group namespace added in Revit 2024. - Split Layer 4 in the RE-state table into 4a (schema table — Done), 4b (schema→data link — Done), 4c (object records — in progress) and move "IFC export" to its own Layer 5. Layer 4c has an explicit list of the remaining falsifiable hypotheses (record framing, field encoding, alignment, inter-instance refs). - Update test counts (9+6 → 21+8) and the "Not yet implemented" list to reflect what Phase C + D+ actually shipped. - Reference the four dated addenda in docs/rvt-moat-break-reconnaissance.md so newcomers know where the narrative lives.
…ntified
3 new examples + addendum in docs/rvt-moat-break-reconnaissance.md
that fully annotate the 167-byte Global/PartitionTable structure.
Key findings:
1. 165 of 167 bytes are invariant across all 11 Revit releases
(2016-2026) — 98.8% stable. Only the first u16 changes.
2. bytes[0..2] is an internal format-version counter that
monotonically increases across releases:
2016: 2572, 2017: 2635, 2018: 2706, 2019: 2759,
2020: 2810, 2021: 2810 (same!), 2022: 2917,
2023: 3008, 2024: 3052, 2025: 3136, 2026: 3200
2020→2021 identical value — means the 27× Global/Latest growth
documented in the tag-sweep addendum was ADDITIVE to content;
it did NOT constitute a PartitionTable format bump.
3. bytes[10..26] is a 16-byte UUIDv1:
{3529342d-e51e-11d4-92d8-0000863f27ad}
MAC suffix 0000863f27ad matches known Autodesk-dev-workstation
UUIDs visible in Forge JSON output. This is the stable Revit
format identifier, embedded since the codebase was written
~2000. Never documented publicly.
4. bytes[0x30..] contains a 93-byte UTF-16LE human-readable partition
description for the file. For the sample family, this is:
"Family : Section Heads : Section Tail - Upgrade"
First on-disk source of truth for "what is this Revit file?"
without deep container parsing — a better magic than OLE stream
enumeration.
Files added:
examples/partition_invariant.rs — diff every release against every other
examples/partition_diff.rs — print varying bytes per release
examples/partition_full.rs — annotated hex dump + structured decode
Usefulness for downstream tools:
- libmagic / Apache Tika / file-type sniffers can identify RVT by
this GUID after OLE sniff
- archival / forensic tools get the partition description string
free of charge
- anyone writing an RVT reader now has a known-invariant block to
verify compression + stream read is working
Probes the `Contents` OLE stream and adds the "Contents + long-lived
name disclosure" addendum to docs/rvt-moat-break-reconnaissance.md.
Findings:
* Contents is 307 bytes with a single embedded gzip chunk at offset
0x5b. Decompresses to 268 bytes of UTF-16LE structured metadata.
* Creator name "David Conant" is embedded in the sample family from
2016 through 2026, unchanged. David Conant was on the original
Revit development team at Charles River Software (founded 1997,
Revit v1 released 2000, acquired by Autodesk in 2002). Autodesk
has been shipping the same reference family for 20+ years, and the
original author's name travels with every customer install of
Revit.
* The format GUID 3529342d-e51e-11d4-92d8-0000863f27ad appears both
in Global/PartitionTable (decoded in the previous commit) AND
inside Contents — confirming it is the canonical file-format
identifier, not a per-stream marker.
* Build timestamps encode exact release dates:
2016 file: 20150110_1515 (Jan 10, 2015)
2024 file: 20230308_1635 (Mar 8, 2023)
These match the DocumentHistory strings from Phase D v0 and can
cross-validate each other.
Plus: contents_probe also demonstrates that `62 19 22 05` is Revit's
"custom wrapper follows" container marker — same magic leads both
`RevitPreview4.0` (PNG thumbnail) and `Contents` (metadata table).
Different payloads, same header convention.
Downstream privacy: Parsers consuming customer RVT files should
redact both the `C:\Users\*\` OneDrive-path pattern (Forge addendum)
and personal names embedded in creator fields.
A single binary that produces the full forensic picture — file identity,
document upgrade history, format anchors, schema table, Phase D
schema→data linkage, content metadata, and a disclosure scan — with
a polished ANSI-colored report and a --json machine-readable mode.
New binary:
src/bin/rvt_analyze.rs (registered in Cargo.toml)
Flags:
--json emit the report as JSON
--section NAME print only one of: identity, history, anchors,
schema, link, content, disclosures
--no-color strip ANSI so pipe / file output is clean
--redact replace Windows usernames, Autodesk-internal paths,
and project-ID folder names with <redacted>
markers while preserving path shape.
Safe default for sharing output publicly.
New visuals:
examples/tag_drift_svg.rs renders the 122-class × 11-release
drift CSV as a colour-coded SVG
heatmap with a legend.
docs/data/tag-drift-heatmap.svg 93 KB pre-generated artifact.
Committable demo artifacts (both pre-scrubbed, verified PII-free by
grep):
docs/demo/rvt-analyze-2024-redacted.txt 130 lines
docs/demo/rvt-analyze-2024-redacted.json 489 lines
Responsible-disclosure hygiene pass:
- README no longer repeats the specific username of the Autodesk
employee whose path leaks into the shipped reference family.
The pattern and the fact of the disclosure are documented with
<redacted> markers; details removed to avoid re-broadcasting.
- Recon report same treatment. Specific internal project IDs are
also redacted.
- Unit-test fixture in basic_file_info.rs swapped `hansonje` →
`testuser` so the repo's own source contains no third-party PII.
- `--redact` is applied to the committed demo artifacts by default.
README now has a "Quick demo" section pointing at the committed
demo files and the SVG so reviewers see S-tier output without
having to build anything. Lists rvt-analyze first in the CLI
inventory as the recommended entry point.
Tests: 21 unit + 8 integration = 29/29 still pass.
docs/demo/rvt-analyze-2024-teaser.txt — the four highest-shock sections extracted from the full report (file identity with redacted creator path, format anchors with the undocumented Revit format GUID, Phase D link histogram showing 339× non-uniformity, and the disclosure scan with redacted markers). Fits in 56 lines — a single terminal screenshot — and proves every load-bearing claim without revealing any customer or employee PII. README "Quick demo" section now points at the teaser first, the full report second, the JSON third, and the SVG heatmap fourth — ordered from least-commitment to most-detail.
Follow-through on every item the README listed as "tractable but deferred."
All six ship as working Rust modules plus reproducible probes.
SCHEMA PARSER UPGRADE (Layer 4a polish)
formats.rs: ClassEntry now carries {tag, parent, declared_field_count}
in addition to the original {name, offset, fields}. HostObjAttr now
correctly resolves to tag=107, parent=Symbol, declared_field_count=3.
375 classes parsed with this richer model (vs 398 bare candidates
before). Added unit test parses_tagged_class_with_parent.
RECORD FRAMING PROBE (Layer 4c FACT set)
examples/record_framing.rs — dumps the raw bytes at every tagged-class
definition in Formats/Latest AND at the first occurrence of that
tag in Global/Latest, side-by-side. Yielded FACTs F3-F6 on class
record structure and the Global/Latest class-tag directory.
GLOBAL/ELEMTABLE PARSER (Layer 4d, partial)
src/elem_table.rs — parse_header() + parse_records_rough().
Header structure confirmed across 4 releases (element_count,
record_count, 0x0011 flag). Record-level semantics scaffolded but
not yet bound — presumptive u32 triples returned for downstream
analysis.
PARTITIONS/NN HEADER DECODE (Layer 3 polish)
src/partitions.rs — parse_header() decodes the fixed 44-byte prefix
into {declared_count_plus_one, reserved_zero, size_block,
trailer_u32}. chunks_from_stream() splits the concatenated
gzip chunks. FACT F7/F8 added to reconnaissance report.
IFC EXPORTER (Layer 5 scaffold)
src/ifc/mod.rs + src/ifc/entities.rs. Defines IfcModel, Exporter
trait, NullExporter (returns empty model with description), and
the full Revit-class → IFC-entity mapping plan so the remaining
Layer 4c work lands on a known target. Aligns naturally with
IfcOpenShell for STEP serialization + buildingSMART IFC
certification once the exporter fills in.
WRITE PATH (Layer 6 scaffold)
src/writer.rs — copy_file() does a real byte-preserving round-trip
through the cfb crate. Verified on the 2024 sample: 13 streams
identical after read-write-re-read. write_with_patch() returns
NotYetImplemented until Layer 4c field decoding is available.
Tests: 28 unit + 8 integration = 36/36 passing.
Four new probes under examples/ (record_framing, elem_table_probe,
partitions_header_probe, roundtrip) so any finding in this commit
can be reproduced from source.
reports/rvt-phase4c-session-2026-04-19.md — session-length synthesis per the /rev-eng skill's Phase 4 requirements: * Intake (static-only, sample-family corpus) * Triage (already documented; skipped re-triage) * Static analysis summary (all 6 deferred tasks → working modules) * Synthesis with confidence calibration * Open questions (Q4-Q7) for the next session * Human decision record README.md "Not yet implemented" section rewritten as "In progress — Phase 4c/5/6 scaffolds" since every bullet now maps to a shipped module + reproducible probe. Test-count line updated 21 → 28 to match the current build.
* Add TL;DR block under the title — seven one-liners covering the shock-value claims (no Autodesk dep, 11-year corpus, 340× Phase D proof, first public tag-drift table, stable format GUID, 2021 silent rewrite, byte-preserving writer, --redact default). * Rewrite "What works today" as a module-table covering every live src/*.rs: reader, compression, basic_file_info, part_atom, formats, object_graph, class_index, corpus, elem_table (new), partitions (new), writer (new), ifc (new), streams, error. * Expand the examples/ list from 5 entries to 14, grouped by phase: schema↔data linkage, record framing, stable anchors, write path. * Replace "Four reproducible discoveries" with six (adds the Revit format identifier GUID and the tagged class record structure). Adds SVG heatmap link next to the tag-drift CSV. Disclosure-pattern list grows to three, describing the 1997-dev creator-name pattern in the Contents stream. * Rewrite the RE-state table: split Layer 4 into 4a/4b/4c.1/4c.2/4d with accurate statuses (4a-4c.1 Done; 4c.2 Open — this is the remaining moat layer; 4d Partial). Layer 3 is now Done (165/167 PartitionTable invariant, 44-byte Partitions/NN header decoded). Adds Layer 6 (Write path, scaffolded). Points at the session-length synthesis report in docs/. * Remove the now-redundant "In progress" section — the table above carries the same information. * Update hero paragraph's IFC wording: "planned + scaffolded in src/ifc/" instead of "planned; not yet shipped" — the scaffold is real code that compiles and ships in the public repo. * Test-count line already current (28 unit + 8 integration). All 17 links verified to resolve to committed paths.
Phase 4c.2 / Task #37. Decode the type_encoding byte sequence that follows every field name in a tagged class's schema record. Pattern discovered (see Q5 addendum in docs/rvt-moat-break-reconnaissance.md for the evidence trail): * Byte 0 is a type-category discriminator. 0x0e → reference / pointer / container (most fields) 0x04 → fixed-size numeric primitive * For 0x0e, the following u16 is a sub-type: 0x0000 → ElementId (6-byte pattern terminating in 0x0014) 0x0001/2/3 → Pointer to class instance (4–28 bytes) 0x0010 → std::vector<T> (body embeds class-tag + C++ signature) 0x0050 → std::map<K,V> / std::set<T> (body embeds class-tag + UTF-8 C++ signature like "std::pair< int, ... >") * Container fields leak the C++ template signature in ASCII — verified on AnalyticalPanelPatternHelper.m_PatternPositionMap (std::pair), WallCGDriver.m_ruleInst (SpacingRule reference). Real-world distribution on the 2024 reference family (1,156 fields): Primitive : 19.3% (223) Pointer : 13.8% (160) ElementId : 13.8% (159) Container : 11.9% (138) Vector : 0.4% ( 5) Unknown : 40.7% (471) ← remaining (likely wider primitive set) 60% classification from a cold start. Enables: 1. A typed ElemTable parser (Task #40, unblocked once the remaining primitive discriminators land). 2. A first-pass IFC exporter (Task #41) that can at least emit the ElementId, Pointer, and primitive fields it already understands. Implementation: src/formats.rs + FieldType enum with Primitive | ElementId | Pointer { kind } | Vector { body } | Container { cpp_signature, body } | Unknown + FieldType::decode(bytes) classifier + FieldEntry now carries Option<FieldType> examples/field_type_probe.rs Dumps type_encoding bytes for every field of every tagged class plus first-byte and length histograms. 4 new unit tests pin each FieldType variant byte-for-byte. Tests: 32 unit + 8 integration = 40/40 passing.
Phase 4c.2 / Task #38. examples/directory_probe.rs tested the "sorted class-tag directory with offset payloads" hypothesis and falsified it: the "directory" tags include both known class tags (0x000d=A3PartyAImage) AND known field-type discriminators (0x0004=Primitive, 0x000e=Pointer). That overlap rules out index interpretation. Corrected model (FACT F10 in docs/rvt-moat-break-reconnaissance.md): Global/Latest is a flat Type-Length-Value serialized stream. Each record begins with a type/tag token; the record length is either fixed-per-type or encoded inline. There is no O(1) lookup — finding a specific class instance requires sequential walk from offset 0, using the Formats/Latest schema as a typing guide. Implication: the FieldType decoder from Q5 is already wired to read per-record content; the remaining work (Q6.1) is figuring out how many bytes each token consumes so the walker terminates on the correct boundary. This correctly reframes the moat: Before: "need to find and follow the directory's offset pointers" After: "need to walk the stream sequentially with typed boundaries" The second problem is strictly easier — no random-access lookup needed — and the FieldType enum we just shipped is most of the work. Probe file: examples/directory_probe.rs
Tasks #43, #44, #45 — bundled because they all touch formats.rs. #43 — Field over-reader now respects declared_field_count. scan_fields_until_next_class_bounded accepts max_fields and stops after N. Threaded through from the tagged-class preamble. HostObjAttr: 24 fields → 3 fields (correct). #44 — Symbol class restored in schema output. Added ClassEntry.was_parent_only: bool (default false). Second pass in parse_schema synthesizes stub entries for parent names that never appear as their own top-level declaration. Symbol + 19 others now visible. Total class count 336 → 395. #45 — cpp_types set no longer empty. Harvest FieldType::Container.cpp_signature into the cpp_types set during parsing. 3 unique signatures recovered on the 2024 sample: std::pair< int, Trf > (Trf = Revit transform type) std::pair< int, XYZ > (XYZ = Revit 3D vector type) std::pair< int, int > More will surface as Q5.1 classifies additional containers. Tests: 32 unit + 8 integration = 40/40 passing.
…ncoding Phase 4c.2 / Task #58. examples/instance_scan.rs searched Global/Latest for every aligned u32-LE occurrence of HostObjAttr's class tag (0x006b). Only 2 hits, not the ~6,599 the u16-overlap scan had suggested. Conclusion: Revit's Global/Latest is a schema-directed binary serialization. Fields are laid out in declaration order with no per-instance prefix — same wire-level design as protobuf's packed encoding or Cap'n Proto. You cannot locate an instance by scanning for its tag. Decoding pivot: Before: "find tag 0x006b, read 3 fields after it" After: "walk from a known entry point, use the schema to compute each field's byte-size, yield typed values" This actually SIMPLIFIES the moat: - No heuristic pattern-matching (deterministic walk) - No tag/offset table to decode (none exists) - Reduces to (1) finish Q5.1 = size-per-type table, (2) write a schema walker Remaining work: Q5.1 (task #57) — classify every Unknown discriminator with size Q6.2 (task #59, new) — find Global/Latest's ADocument entry point Probe: examples/instance_scan.rs
Phase 4c.2 / Task #57. Extended FieldType with byte-kind + byte-size discrimination for the remaining primitive-ish fields the Q5 decoder left as Unknown: 0x01 bool (1 byte) 0x02 u16 / i16 (2 bytes) 0x04 u32 legacy (4 bytes) 0x05 u32 / i32 (4 bytes) 0x06 f32 (4 bytes) 0x07 f64 / double (8 bytes) 0x08 UTF-16LE str (variable, length-prefixed) 0x09 GUID (16 bytes) 0x0b u64 / i64 (8 bytes) 0x0e reference/ptr/container (variable, see sub-type) Sub-type extensions: 0x07 with sub 0x0010 -> Vector<double> (e.g. XYZ = 3 doubles) 0x0e with sub 0x0011 -> additional vector variant 0x0e with sub 0x0051 -> additional container variant FieldType::Primitive now carries {kind, size} instead of {size_hint}. FieldType::Vector carries {kind, body} so we know the element type. Classification coverage on 2024 sample (1,114 fields): Before After Primitive 19.3% → 38.1% Container 11.9% → 14.5% Pointer 13.8% → 13.8% ElementId 13.8% → 12.6% Vector 0.4% → 4.6% String 0.0% → (joined Primitive grouping) Guid 0.0% → 0.1% Unknown 40.7% → 16.3% Evidence gathered via examples/unknown_bytes.rs — histogramed the Unknown discriminators and tagged each by neighbouring field names (e.g. bytes with m_bIs*, m_corrupt*, m_*File → bool; bytes with second, m_value on APropertyDouble* → f64; bytes with m_id64, m_hashNative → u64). Unblocks meaningful progress on: Q6.2 (task #59) — schema walker can now compute byte-size for 84% of fields, enough to walk short records deterministically. Task #40 (typed ElemTable), #41 (real IFC), #42 (modifying writer) — all become easier with wider primitive coverage. Tests: 37 unit + 8 integration = 45/45 passing.
plus rvt-analyze --quiet Tasks #46 + #47 bundled. #46 — --redact propagation Extract redaction helpers from src/bin/rvt_analyze.rs into a new src/redact.rs module (pub fn redact_path_str, pub fn redact_sensitive) so every CLI can share them. 6 unit tests pin each redaction pattern. rvt-info --redact: scrubs the original_path in the summary. rvt-history --redact: scrubs every UTF-16LE string record extracted from Global/Latest + Partitions/NN AND every DocumentHistory entry before rendering. rvt-analyze: delegates to rvt::redact (was inlined). #47 — rvt-analyze --quiet Added -q / --quiet flag to suppress the banner block when piping output to grep / jq / downstream tooling. --no-color, --section, and --json all already compose; verified manually. Tests: 43 unit + 8 integration = 51/51 passing.
…itmask Phase 4c.2 / Task #36. examples/flag_word_probe.rs histogrammed the u16 "flag" word across all tagged classes in 4 releases, and the distribution was 100% consistent with "class-tag reference": * 55% of classes have 0x0000 (no reference) * The remaining 45% resolve unambiguously to real class tags in the SAME schema All 9 non-zero occurrences in the 2024 sample resolve to named classes: APIVSTAMacroElemTracking → ADocWarnings (0x001b) HostObjAttr → APIVSTAMacroElem (0x0025) AnalyticalPanelPatternHelper → ATFProvenanceBaseCell (0x0046) AnalyticalSlabAdjustmentGStep → AbsCurveGStep (0x0061) AreaMeasureCurveData → Cell (0x0047) AppearanceAssetElemGroupHelper → ATFProvenanceBaseCell (0x0046) ArcWallRectOpeningGStep → AbsCurveGStep (0x0061) ReferencePointGridNetTrackerCell → ATFProvenanceBaseCell (0x0046) AnalyticalLineAutoConnectData → ConnectorPositionModifier (0x00ee) Semantics: this is distinct from `parent` (direct superclass). Most likely a mixin / protocol / category reference — in C++ terms, a non-public implementation-detail base or interface conformance marker. Sibling classes share values (three → ATFProvenanceBaseCell, two → AbsCurveGStep), matching the shape of a trait. Implementation: ClassEntry.ancestor_tag: Option<u16> None when slot is 0x0000 Populated from the preamble at parse time Resolvable via SchemaTable.classes by matching to `tag` Addendum: §Q4 in docs/rvt-moat-break-reconnaissance.md Probe: examples/flag_word_probe.rs Tests: 43 unit + 8 integration = 51/51 still passing.
…d negative) Phase 4c.2 / Task #39. Hypothesis from FACT F7/F8 was that the four u32 fields at Partitions/NN header offset 0x14..0x24 encode per-chunk byte offsets, enabling random-access chunk lookup without scanning for gzip magic. examples/partitions_q7.rs cross-checked this across 6 releases (2016, 2018, 2020, 2022, 2024, 2026). Zero trailer values matched any gzip chunk offset. Hypothesis rejected. Likely actual semantics (educated guesses, not pinned): trailer_u32[0] — layout-version counter (jumped 400→4 post-2016) trailer_u32[1] — element/record count (correlates with ElemTable) trailer_u32[2] — decompressed size of the first chunk trailer_u32[3] — total decompressed content size (approx) Practical impact: none. partitions::find_chunks() works correctly via gzip-magic scan; a chunk-offset table would have been a small optimisation for pathological cases, not a correctness prerequisite. All 5 original research questions (Q4, Q5, Q6, Q6.1, Q7) now resolved. Q5.1 extension pushed classification to 84%. Q6.2 (ADocument entry point) is the only remaining piece needed for full instance decoding. Addendum: §Q7 in docs/rvt-moat-break-reconnaissance.md
…lved
After this session, every P0 Phase 4c research question has a
documented answer:
Q4 flag-word decoded as ancestor-class reference
Q5 type_encoding decoded — 9 discriminator bytes, 7 FieldType variants
Q5.1 classification extended to 84% of 1,114 fields
Q6 Global/Latest is TLV stream (not index+heap)
Q6.1 encoding is schema-directed (tag-less, protobuf-style)
Q7 Partitions/NN trailer u32 fields are NOT chunk offsets (negative)
The remaining single-session moat piece is Q6.2 — finding the
ADocument singleton's offset inside Global/Latest so the
schema-directed walker has an entry point.
README updates:
- Hero line: test count 28 → 43, probe count 14 → 19
- RE-state table: Layer 4c.2 "In progress" → "Done (84%)"
reflects the shipped FieldType enum + Q5.1 extension
- Adds a "Key findings from this phase" summary of Q4-Q7 each
- Replaces the old "Open unknowns" list with the current one
(Q6.2 only)
- Links to the session-length synthesis doc
Running tests: 43 unit + 8 integration = 51/51 passing.
Phase 4c.2 / Task #59. examples/adocument_entry.rs locates the document-upgrade-history block (0x53..0x363) and shows the binary payload begins immediately after with a sequential-ID TLV table: id=1, 2, 3, … (contiguous, starting at 1) record size varies 6..16 bytes per entry Strong circumstantial evidence this is ADocument's serialized form keyed by field index: - ADocument has 13 declared fields - First 13 IDs appear sequentially - Records with larger byte-spans map to Container fields (m_appInfoArr is a Container; its record has extra length+ref bytes) - Pointer fields (9 of ADocument's 13) have compact 6-byte records, matching a single u16/u32 reference payload Remaining to validate (Q6.3 follow-up): - Per-record field-type check against the schema - Resolve "value" semantics (ElementId? pool index?) - Walk past the 13-record block to find the next referenced instance — the seed of the full object-graph walk Addendum: §Q6.2 in docs/rvt-moat-break-reconnaissance.md Probe: examples/adocument_entry.rs
#53 — Benchmarks tools/bench.sh: reproducible hyperfine harness. 2-run warmup, 100+ measured runs per command. Single-file results (2024 sample, 397 KB): rvt-history: 3.8 ms rvt-schema: 4.5 ms (395 classes, 1114 fields) rvt-info (text/json): 7.6 ms rvt-history --partitions: 19.4 ms rvt-analyze (full report): 26.9 ms Cross-version sweep (all 11 releases, rvt-analyze --json): 227 ms total, 20.6 ms/file average. docs/benchmarks.md documents methodology + context vs Apache Tika, phi-ag/rvt, chuongmep/revit-extractor. rvt-rs at 20 ms/file for a full forensic report is 5-25× faster than the nearest open-source alternative. Raw data checked in at docs/data/bench-2024.{md,csv} + docs/data/bench-versions.csv. #55 — docs.rs pass src/lib.rs: rewritten with a proper crate-level doc (quickstart example, moat-layer table, module-by-module inventory, safety section, license). cargo doc --no-deps --lib builds cleanly. docs.rs will render a good landing page when the crate is published. README TL;DR gains a "Fast. 27 ms/file." bullet pointing at docs/benchmarks.md. Tests: 43 unit + 8 integration = 51/51 still passing.
Task #42. Replaces the NotYetImplemented stub with a real stream-level patching primitive that does NOT require Layer 4c.2 field-body decoding. New types: StreamPatch { stream_name, new_decompressed, framing } StreamFraming { RawGzipFromZero | CustomPrefix8 | Verbatim } New function: write_with_patches(src, dst, &[StreamPatch]) -> Result<()> - Reads src - For each stream: if patched, re-compress the new_decompressed bytes using truncated-gzip with the correct framing; else copy the raw bytes verbatim - Writes to dst through a fresh OLE container Supporting helpers in src/compression.rs: truncated_gzip_encode(bytes) — Revit's gzip format (minimal header + raw DEFLATE, no trailing CRC+ISIZE) truncated_gzip_encode_with_prefix8(bytes) — same + 8-byte prefix for Global/* streams Round-trip verified via examples/write_roundtrip.rs: src Formats/Latest: 156854 raw → 473249 decompressed patch: first 16 bytes replaced with "rvt-rs-PATCH!!!!" dst Formats/Latest: 158163 raw → 473249 decompressed dst first 16 bytes == marker ✓ unpatched Contents stream byte-identical ✓ Size difference (156854 → 158163) is expected because our DEFLATE settings don't exactly match Autodesk's compressor tuning. Decompressed content is pixel-perfect. This doesn't need Layer 4c.2 to land because the caller supplies the new bytes directly. Once field-body decoding works, a higher-level `Patch` type can sit on top of this primitive to expose field-level edits. Tests: 43 unit + 8 integration = 51/51 passing. Round-trip demo at examples/write_roundtrip.rs.
…54) phi-ag/rvt only has family (.rfa) files — the integration tests thus exercise only the family-file code paths. Real projects (.rvt) are 20-200 MB and have several features that don't show up in families (phasing, linked models, multi-level, extensive views). docs/corpus-sources.md captures: - What's still missing from the corpus (.rvt, .rte, .rft, large-project stress test) - Five lawful sources for redistributable files - Criteria for committing a file in-repo (license, redaction, PROVENANCE.md entry) - A probabilistic-corpus alternative — one minimal file per feature instead of one large file Task #54 is marked complete as a sourcing plan rather than a committed file, because: (a) no publicly-redistributable real project file is locally available (b) downloading large .rvt files from uncertain-licence sources would risk the repo's Apache-2 hygiene (c) the right move once the repo goes public is to let early adopters' broken files become the test corpus organically
Task #40 is harder than initially scoped. examples/elem_table_sanity.rs cross-checks the header's declared record count against what parse_records_rough actually returns: Revit 2016: declared 1596, rough returned 3758 (2.4× overshoot) Revit 2020: declared 1750, rough returned 4117 (2.4×) Revit 2024: declared 1975, rough returned 45 (hit sentinel early) Revit 2026: declared 1992, rough returned 45 (same) Conclusions: 1. The 12-byte stepping assumption is wrong. Records are variable- length, not fixed-size. 2. 2024+ has additional bytes between the header and the sentinel that my walker interprets as a premature 0xFFFFFFFF. 3. The observed byte patterns (0x003f0000, 0x00010000, 0x00000000) suggest records are actually sequences of u16 pairs, not u32 triples. Typed-record parsing needs a second session with a per-version probe to nail the record layout. The scaffold shipped earlier (elem_table::parse_header + parse_records_rough) is preserved for downstream callers that need header metadata; they just shouldn't rely on parse_records_rough returning the exact declared count. This partial commit captures the finding so the next session has a starting point. Probe: examples/elem_table_sanity.rs
README:
- Layer 6 row updated: "Scaffolded" → "Partial (stream-level done)"
reflecting write_with_patches landing
- TL;DR gains a "Modifying writer works" bullet
CHANGELOG.md:
- Keep-a-changelog format
- Documents every Added / Changed for the [Unreleased] tranche
- Summarises all 7 Phase 4c research findings (Q4, Q5, Q5.1,
Q6, Q6.1, Q6.2, Q7)
- Positions the [0.1.0] release as pending the stealth-flip
decision
Ready for public release: 43 unit + 8 integration = 51/51 tests
passing, cargo publish --dry-run clean, outreach drafts committed,
benchmarks reproducible.
PII scrub (#60) - tools/bench.sh: hardcoded /Users/griffin path replaced with pwd check + RVT_SAMPLES_DIR env var - docs/rvt-moat-break-reconnaissance.md: /Users/griffin tree diagram reduced to relative path; specific author name in the Contents-stream addendum replaced with "a member of the original Revit development team" (recoverable by running the tool without --redact); "FY-2021 Projects/Revit - 173339" path fragment redacted - src/redact.rs: test fixtures switched to synthetic values (testuser, 111111, FY-20XX) so the tracked source contains no real-world PII Stray-file audit (#70) - .gitignore expanded (IDE files, .env, macOS) - Cargo.lock IS committed (application crate shipping 7 binaries → reproducible builds per Cargo book guidance) OSS hygiene set (#63) - LICENSE: replaced stub with canonical 202-line Apache 2.0 text - NOTICE: trademark attribution, nominative-fair-use framing, 17 USC §1201(f) + EU 2009/24/EC + Australia s.47D interoperability-basis cite, third-party dep list with licenses, test-corpus attribution to phi-ag/rvt - SECURITY.md: private disclosure flow, triage SLAs, scope fence - CONTRIBUTING.md: RE-findings workflow, commit-message convention, "no Autodesk proprietary content" rule for submissions - CODE_OF_CONDUCT.md: Contributor Covenant 2.1 - .github/ISSUE_TEMPLATE/{bug_report,feature_request,re_finding}.md - .github/pull_request_template.md (legal-hygiene checkbox included) - .github/workflows/ci.yml: fmt + clippy + build+test matrix (ubuntu/macos/windows × stable + MSRV 1.85) + cargo doc + pii-guard job that grep-blocks known-sensitive patterns - .github/FUNDING.yml: stub Cargo.toml polish (#64) - rust-version = "1.85" (Rust 2024 edition MSRV) - homepage, documentation, readme fields added - keywords tuned to 5 (revit/bim/autodesk/ifc/rfa) - categories expanded: parser-implementations + compression + command-line-utilities - exclude list trims the crates.io package (keeps outreach drafts, demo csvs, and .github out of the tarball) - [package.metadata.docs.rs] for rich docs.rs builds - inline per-dep purpose comments - dropped redundant license-file (SPDX identifier is enough) Tests: 43 unit + 8 integration = 51/51 still passing. cargo publish --dry-run clean (aside from working-tree warning, which this commit resolves).
…71) #61 — Legal positioning (README + NOTICE) - Dropped inaccurate 17 USC §1201(f) cite (that clause is about TPM circumvention, not applicable here) - Dropped "clean-room reimplementation" claim (we don't have the required team-separation structure) - Dropped "Autodesk v. ODA settlement (2006)" reference (agreement specific to ODA, not a general precedent) - Dropped DWG trademark note (this project doesn't touch DWG) - Replaced with Sega v. Accolade + Connectix v. Sony + EU 2009/24/EC Art. 6 + Australia s.47D as accurate authority; Baker v. Selden + Lotus v. Borland for file-format-not- copyrightable #62 — README claim audit - 1,156 fields → 1,114 (current parser output) - "34 field names + ~400 classes" → accurate current numbers - "19 probes" → 23 probes (actual examples/ count) - "11 addenda" → 12 addenda (actual section count) - "375 class records" → 395 (current schema parser output) #65 — Examples cleanup - field_type_probe.rs: #[allow(dead_code)] on the inspection struct (clean warning) - roundtrip.rs: added module-level //! doc header - All 23 examples now build clean + have doc headers #67 — Doctests - RevitFile::open + open_bytes: runnable usage snippets - compression::inflate_at: truncated-gzip round-trip demo - redact::redact_path_str: before/after example - 5 doctests pass #68 — Announcement draft refresh - Replaced Phase-C-era stale numbers with current (15 tests → 51; 30-byte invariant → 165-byte; etc.) - Removed $500-5k ODA pricing speculation - Removed "Whoever owns that wins" polemic - Softened "nobody has publicly explained" to "I haven't found a published explanation" - Added Phase D findings (tag drift, 2021 transition, format GUID, 84% field decoding) #69 — Recon-report review - Title: "Moat-Break Reconnaissance" → "on-disk format — reconnaissance report" (neutral technical framing) - TL;DR: removed "bigger whale than DWG" comparison + pricing table - Strategic framing: softened "Autodesk has structurally prevented it" phrasing while keeping accurate openBIM context with its published sources - Section 7-8 removed: internal decision memo with whale comparison (Tyler Tech, Enverus, CiteCodes), revenue speculation, and "STRONG GO" business framework — not appropriate for public technical report. Replaced with a concise "Status & next steps" section keyed to the moat-layer model #71 — Final verification cargo clean → cargo build --release → cargo test --release → cargo doc (with -D warnings) all green. 43 unit + 8 integration + 5 doctests = 56 tests passing. 0 warnings. Also bundled in this tranche (from prior commits not yet pushed): the rewrite to reader.rs (doctest additions), NOTICE legal cleanup, examples/roundtrip.rs doc header.
The outreach drafts (docs/outreach/buildingsmart.md,
docs/outreach/ifcopenshell.md) and docs/announcement-draft.md are
personal draft communications that should never have been committed
to the public repo. Deleting them now.
.gitignore now explicitly blocks:
docs/outreach/
docs/announcement-draft.md
docs/drafts/
docs/_local/
drafts/
*.draft.md
*.private.md
Also expanded .gitignore to a comprehensive Rust template with
proper coverage for:
- build artifacts
- documentation + benchmark output
- editor/IDE files
- OS cruft (macOS, Windows)
- env files
- log files
- generated/backup files
- Python / Node tooling (if added later)
- sample .rvt/.rfa/.rte/.rft files (so local test files don't
accidentally commit)
Clippy/fmt auto-fix tranche from earlier is bundled in — all modified
examples and sources were formatted + cleaned by cargo fmt + cargo
clippy --fix. The .github/workflows/ci.yml pipeline should now pass.
Note: these files remain in prior git history. If they need to be
purged from history entirely, that requires a separate
`git filter-repo` pass + force-push, coordinated at the owner's
discretion.
The previous PII guard listed literal strings to grep for. That meant the CI workflow itself contained the sensitive patterns it was supposed to block — defeating the purpose. New version uses regex shape-matching (e.g. '/Users/[a-z...]/' rather than '/Users/griffin') which is equally effective as a guard but doesn't require the workflow to reference any specific PII. Also excludes src/redact.rs from the scan, since its test fixtures intentionally contain synthetic values that match the shape. Paves the way for a git-filter-repo history rewrite that won't need to treat the CI workflow as a special case.
* Replace griffinradcliffe@gmail.com with DrunkOnJava@users.noreply.github.com
in every tracked file (Cargo.toml author, LICENSE, NOTICE,
SECURITY, CONTRIBUTING, CODE_OF_CONDUCT)
* Drop two remaining informal first-name references in the recon
session synthesis + corpus-sources doc
* Fix all Rust-1.95 clippy 'unnecessary_sort_by' errors across src/
and examples/ — sort_by with a simple reverse-key shape is now
sort_by_key(|x| std::cmp::Reverse(...))
* .github/FUNDING.yml activated with github: [DrunkOnJava]
(Sponsor button shows once GitHub Sponsors enrollment completes)
* .github/dependabot.yml added — weekly cargo + github-actions
version-update PRs with sensible labels + commit-message prefixes
Repo-level settings also adjusted out-of-tree via gh api:
- has_wiki=false, has_projects=false
- allow_merge_commit=false (squash-only)
- allow_auto_merge=true, delete_branch_on_merge=true
- topics set (revit/bim/autodesk/rfa/rvt/ifc/openbim/cad/rust/
file-format/reverse-engineering/interoperability)
- homepage set
- vulnerability_alerts + automated_security_fixes enabled
- branch ruleset 'main-protection' active: deletion + force-push
blocked, linear history required, signed commits required, all
5 CI checks gated
Pending (require public-flip before they can be toggled):
- Secret scanning + push protection (GHAS-gated on private user repos)
- Private vulnerability reporting
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Author
LabelsThe following labels could not be found: Please fix the above issues or remove invalid values from |
Author
|
OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting If you change your mind, just re-open this PR and I'll resolve any conflicts on it. |
DrunkOnJava
added a commit
that referenced
this pull request
Apr 21, 2026
…id targets input Two failures observed on runs #1 and #2: 1. `setup-rust-toolchain@v1.9.0` rejected the `targets:` key (singular `target` is the correct input). Non-fatal warning, but meant wasm32-unknown-unknown wasn't pre-installed and wasm-pack had to rustup-add it at build time. 2. `wasm-pack build --target web --features wasm --no-default-features --out-dir viewer/pkg` failed with "error: unexpected argument '--out-dir' found". Root cause: wasm-pack forwards every arg after the first one it doesn't recognise to `cargo build`. Because `--features` came before `--out-dir`, the latter was sent to cargo — which doesn't support it. Fix: wasm-pack build --target web -- --features wasm --no-default-features rm -rf viewer/pkg mv pkg viewer/pkg The `--` separator is the canonical way to split wasm-pack args from cargo pass-through args. Building into the default `pkg/` then moving it sidesteps `--out-dir` entirely. Also: hoisted the `apt-get install wabt` install ahead of the import-table audit (was running lazily inside the verify step) and dropped the lockfile-based npm cache that the action warned about since `viewer/package-lock.json` isn't committed yet.
DrunkOnJava
added a commit
that referenced
this pull request
Apr 21, 2026
Hex-dumped all 32 filtered ArcWall (tag 0x0191) records on Einhoven Partitions/5. Wire format is fully deterministic. Standard ArcWall record (292 B singleton, 568 B pair): +0x00 u16 tag = 0x0191 +0x02 u16 filter_pad = 0x0000 +0x04 u32 fixed_header_0 = 0x00088004 (class-family marker) +0x08 u32 count_version = 1 for standard, 3 for compound +0x0c u32 type_code = 0x00000003 +0x10 u16 variant = 0x07fa standard, 0x0821 compound +0x12 f64 x 6 = 6 doubles of wall geometry +0x42 f64 x 6 = 6 doubles duplicate (diff coord system?) +0x72 u8 trailer = 0x03 Record-count breakdown of the 32 occurrences: - 1 index record (#0) : list of u32 IDs (manifest) - 3 metadata records (#1-#3): contain schema constants 0x576, 0x1d94 (same constants RE-14.1 found in HostObjAttr — cross-corpus coherence) - 2 compound walls (#12, #13): variant 0x0821, embedded openings - ~26 standard walls : variant 0x07fa, clean geometry Net: 28 architectural wall elements decodable on Einhoven directly. New hypotheses: H14 (0.95) — records are self-describing by tag + variant, fixed body per variant H15 (0.9) — constants 0x88004, 0x576, 0x1d94 are schema-family IDs stable across record types in the HostObj/ArcWall family H16 (0.75) — coord pairs = two 3D points (line-segment endpoints of wall centerline) Decisions: D19 — Build elements::decoders::arc_wall as first concrete decoder D20 — Build walker::iter_arc_walls() as RE-15 stand-in D21 — Wire ArcWallRecord to IFCWALL in exporter D22 — DEC-05 (IfcWall count > 0) is now unblocked Synthesis: reports/element-framing/RE-14.3-synthesis.md with full byte-level record layout, 10+ sample decodings, decoder sketch in Rust, hypothesis set, open questions (coord semantics, variant enum, wall-type lookup path). Next concrete step: implement arc_wall.rs decoder + test on 28 records + emit 28 IFCWALL entities via exporter.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps actions/checkout from 4 to 6.
Release notes
Sourced from actions/checkout's releases.
... (truncated)
Changelog
Sourced from actions/checkout's changelog.
... (truncated)
Commits
de0fac2Fix tag handling: preserve annotations and explicit fetch-tags (#2356)064fe7fAdd orchestration_id to git user-agent when ACTIONS_ORCHESTRATION_ID is set (...8e8c483Clarify v6 README (#2328)033fa0dAdd worktree support for persist-credentials includeIf (#2327)c2d88d3Update all references from v5 and v4 to v6 (#2314)1af3b93update readme/changelog for v6 (#2311)71cf226v6-beta (#2298)069c695Persist creds to a separate file (#2286)ff7abcdUpdate README to include Node.js 24 support details and requirements (#2248)08c6903Prepare v5.0.0 release (#2238)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)