fix(webview/whatsapp): IDB walk + DOM scrape + active-chat plumbing (#1017)#1034
Conversation
…inyhumansai#1017) `whatsapp_scanner/idb.rs` sent `"indexName": ""` to CDP `IndexedDB.requestData`. The CDP spec says empty string means "use the primary-key index", but the C++ backend in CEF 146 (Chrome 146.0.7680.165) rejects this with `{"code":-32000,"message":"Could not get index"}` and refuses the call entirely. All four IDB walks (`message`, `chat`, `contact`, `group-metadata`) failed every tick, the WhatsApp scanner emitted zero memory docs, and `memory_recall_memories {namespace:"whatsapp-web:<acct>"}` stayed empty. Slack and Telegram already shipped this exact fix months ago — see `slack_scanner/idb.rs:210-214` and `telegram_scanner/idb.rs:210`, both of which have an explicit comment block warning future contributors not to add the empty `indexName` back. Only WhatsApp regressed. Drop the line. Add a matching comment block referencing the sibling scanners so this stays a one-time mistake. Verified: `cargo test --lib whatsapp_scanner` is green (18 passed, including a new `requestdata_params_omit_index_name` regression test that asserts the JSON payload omits the field). Runtime verification (memory_recall_memories returning non-empty after a 30s scan tick) deferred to packaged-build smoke per `feedback_validation_test_target.md`.
…xedDB.requestData (tinyhumansai#1017) Adds `requestdata_params_omit_index_name` to `whatsapp_scanner/idb_tests.rs`. Builds the same JSON payload `read_store` sends to CDP and asserts `params.get("indexName").is_none()`. This is a regression test for the bug fixed in the previous commit: re-introducing `"indexName": ""` would silently break IDB ingestion with no compile-time signal, since the CDP call type is `Value` and the failure surfaces only at runtime as a `Could not get index` warning that's easy to miss in dev:app log noise. The test message cites `slack_scanner/idb.rs:210-214` (where the same fix was first documented) so anyone tripping the test gets the historical context immediately.
11-row acceptance-criteria audit run against `pnpm dev:app` on `main` (b11b8f3+) before the fix in this PR, then logged with post-fix expectations. Records: 7 pass / 1 partial (video forces download) / 3 fail (memory IDB ingest, calls don't connect, DOM ingest = 0 — the latter likely gated on the IDB fix). Documents the one-line `indexName` fix that this PR ships and the four out-of-scope items deferred to separate child issues: - Bug 2 (DOM = 0) — gated on Bug 1 verification - Bug 3 (video codec) — CEF build/packaging concern, not a code change in this repo - Bug 4 (calls) — needs cross-browser control test before pinning on OpenHuman - Bug 5 (voice msg empty body) — auto-resolves once Bug 1 IDB walks succeed Sign-off block included for the runtime-smoke verification on a packaged build.
tinyhumansai#1017) Bug 2 in tinyhumansai#1017's matrix: post-Bug-1 the IDB walk worked but the DOM scan still emitted zero rows. Live CDP probe (2026-04-30) showed three drift points in WhatsApp Web's HTML since `dom_snapshot.rs` was last touched: 1. **`data-id` format** — message rows used to expose `"<fromMe>_<chatId>_<msgId>"`. Current builds publish only the bare msgId hex (e.g. `"AC2E44BDA…"`, 32 hex chars). The strict `splitn(3, '_')` matcher rejected every row → `parse_rows` returned empty. `split_data_id` now accepts both shapes; `from_me` and `chat_id` come back empty for the bare format and the merge in `mod.rs` reverse-looks them up by msgId-tail / active-chat header. 2. **`span.selectable-text` class** — the body text is now rendered without that class. The fallback `span[dir="ltr|rtl"]` matcher in `find_body` already covered this, but the doc and module-level comment were stale. 3. **Active chat name extraction** — modern WhatsApp Web omits chat JID from the URL, from `data-id`, and from any DOM attribute we could find. The only DOM signal that carries it is `header[data-testid="conversation-header"]`'s first non-icon `<span>`. New `parse_active_chat_name` walks the header subtree, skipping Material/`wds-icon` ligature spans (`wds-ic-search`, `wds-ic-disappearing-messages`, etc.) so the chat title wins. Returned alongside rows + hash from `capture_messages` so the caller can reverse-lookup chat JID via the IDB-side chats map. New tests: `split_data_id_accepts_bare_msg_id`, `split_data_id_accepts_long_alnum_msg_id`, plus an extended reject case for non-message hooks. All pass (10/10 in `dom_snapshot::tests`). Verified at runtime: `[wa][<acct>] full scan ok messages=20000 chats=2249 dom=80` post-fix (was `dom=0` pre-fix). Memory ingest is still gated on the chats-map reverse lookup (see follow-up issue tracking the IDB chat_names gaps); this commit is the DOM-side enabler.
…ge fallback (tinyhumansai#1017) Once Bug 2 unblocked the DOM scan, the merge step still produced `patched=0 appended=N` every tick. Two reasons, both addressed here: 1. **IDB id != DOM data-id** — IDB stores message id as the compound `_serialized` (`"false_<chat_jid>_<msgId>"`). DOM data-id is now bare msgId hex (e.g. `"AC2E44BDA…"`). The `by_msg_id` lookup in `emit_snapshot` first tries the full IDB id, then falls back to its trailing segment after the last underscore — that segment is the bare msgId for legacy compound ids and a no-op for already-bare ids, so both paths converge. 2. **DOM-only rows have no chatId** — the bare-msgId DOM rows do not carry chat context, so when the merge appends them as `bodySource=dom-only`, `chatId` was `Null`. `emit_grouped_whatsapp` rejects rows whose `chatId` is empty, so every DOM-only body got dropped on the floor. Added an active-chat-jid resolution step ahead of the merge: - `ScanSnapshot` gains an `active_chat_name` field, populated from the conversation header in `dom_snapshot::capture_messages`. - `emit_snapshot` reverse-looks-up the name in `snap.chats` (which the IDB walk populates from chat / contact / group-metadata stores) with exact then case-insensitive then substring fallback. Substring-match only wins when there is exactly one candidate so we do not cross-attribute on common tokens. - DOM-only appended rows now stamp the resolved jid into `chatId` when no DOM-side chatId exists. - One `info!` log per scan tick records the resolution outcome so smoke can tell at a glance whether the lookup is finding a hit. The plumbing is defensive — if the reverse-lookup returns `None` (un-saved 1:1 contact, group whose subject did not reach `chats`, broadcast list which we do not yet walk), DOM-only rows still flow through with `chatId=Null` and get dropped at `emit_grouped_whatsapp` exactly as before. No regression for chats whose IDB entry already had the right name. Bug 2 verified at runtime; the chats-map reverse lookup is gated on a follow-up that closes the `chat_names` gaps in `idb.rs` (group-metadata id normalize, broadcast store walk, message-envelope pushName fallback for un-saved contacts).
…bing state (tinyhumansai#1017) Original audit doc only covered Bug 1 (the IDB indexName fix). After landing the DOM-side fixes (Bug 2 — bare-msgId data-id, conversation-header active chat extraction with icon-ligature skip) and the partial Bug 6 plumbing (active_chat_name reverse lookup, msgId-tail merge fallback, DOM-only chatId stamp), the matrix needed: - A new Bug 2 row in the "Fixes shipped" table with root cause, fix shape, and runtime verification. - A Bug 6 (partial) row that calls out the plumbing this PR landed and the IDB chat_names gaps that block end-to-end memory ingest. - An updated "Out of scope" section that reorders the deferred items, replaces the now-shipped Bug 2 entry with the Bug 6+7 follow-up tracker, and points at the draft issue body in `.claude/scratch/`. - A refreshed sign-off recording the runtime status + remaining action items. The doc deliberately doesn't include real chat / contact / group names — those were redacted from the smoke transcript when drafting the follow-up issue and are treated the same way here.
📝 WalkthroughWalkthroughUpdates WhatsApp scanner's DOM message capture to return active conversation display names, adds support for both legacy compound and new bare message ID formats in DOM parsing, removes empty Changes
Sequence DiagramsequenceDiagram
participant DOM as DOM Snapshot
participant Parser as Message Parser
participant Snapshot as ScanSnapshot
participant Resolver as Chat Resolver
participant IDB as IDB Store
participant Merger as Message Merger
DOM->>Parser: capture_messages()
Parser->>Parser: Parse message rows<br/>(bare msgId or compound)
Parser->>Parser: Extract active_chat_name<br/>from header[data-testid]
Parser-->>Snapshot: Return (messages, msgCount, active_chat_name)
Snapshot->>Resolver: Resolve active_chat_jid<br/>from chats map
Resolver->>Resolver: Match name via<br/>exact/case-insensitive/substring
Resolver-->>Snapshot: active_chat_jid
Snapshot->>IDB: read_store (no empty indexName)
IDB-->>Snapshot: IDB records
Snapshot->>Merger: Merge DOM + IDB messages
Merger->>Merger: Extract bare msgId tail<br/>from DOM data-id
Merger->>Merger: Lookup in IDB using<br/>alternate key
Merger->>Merger: Stamp chatId using<br/>active_chat_jid
Merger-->>Snapshot: Reconciled messages
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src-tauri/src/whatsapp_scanner/dom_snapshot.rs`:
- Around line 274-290: The current heuristic in looks_like_icon_ligature is too
broad and rejects real chat titles; change it to only detect icons by explicit
markers (keep the existing checks for "wds-ic-" and "wds-icon" prefixes) and
remove the generic "single lowercase token" rule; instead, rely on DOM context
in parse_active_chat_name (e.g., check element.class names like
"material-icons", "wds-ic", "wds-icon", or parent wrapper classes/attributes
that indicate an icon) and only call looks_like_icon_ligature when those
explicit icon markers are present so normal lowercase titles (e.g., "mom",
"family") are not filtered out.
In `@app/src-tauri/src/whatsapp_scanner/mod.rs`:
- Around line 354-355: The fast-tick branch is discarding the active_chat_name
from the captured tuple (let (rows, hash, _active_chat_name) = captured?) so DOM
rows forwarded on the 2s path lack a chatId and are skipped by
emit_grouped_whatsapp(); update the handling where captured is unpacked to
preserve and apply active_chat_name (use the existing variable name
active_chat_name) — e.g., include active_chat_name when converting/packaging
dom_messages or attach it to each DomMessage JSON before forwarding to
emit_grouped_whatsapp() so fast ticks carry the same chat stamping as the
full-scan path.
- Around line 493-535: The active_chat_jid resolution fails because the code
assumes each chat value is an object with a "name" field (chat.get("name"))
while snap.chats currently stores jid → Value::String; update the lookup in the
active_chat_jid closure to handle both shapes by extracting the display name via
either chat.get("name").and_then(|v| v.as_str()) or chat.as_str() (falling back
from object → string), and apply the same dual-shape extraction wherever chat
names are checked later; alternatively, normalize snap.chats in scan_once() to
store jid → {"name": ...} so active_chat_jid and subsequent chat-name lookups
(symbols: active_chat_jid, snap.chats, scan_once) see consistent {"name": ...}
objects.
In `@docs/qa/WHATSAPP-PARITY.md`:
- Line 73: In the QA table row the inline selector `span[dir="ltr|rtl"]` is
being treated as a Markdown table delimiter; update that cell to escape the pipe
(e.g. change `span[dir="ltr|rtl"]` to `span[dir="ltr\|rtl"]`) so the row renders
with the correct number of columns—look for the exact string
`span[dir="ltr|rtl"]` in the row and replace it with the escaped version.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: af2e6ce6-c237-40bf-981b-f3582ed7de44
📒 Files selected for processing (5)
app/src-tauri/src/whatsapp_scanner/dom_snapshot.rsapp/src-tauri/src/whatsapp_scanner/idb.rsapp/src-tauri/src/whatsapp_scanner/idb_tests.rsapp/src-tauri/src/whatsapp_scanner/mod.rsdocs/qa/WHATSAPP-PARITY.md
| /// Returns true when `s` looks like a Material/WhatsApp icon ligature name | ||
| /// (e.g. `wds-ic-search`, `wds-ic-disappearing-messages`, `material-icons`, | ||
| /// `arrow_forward`). These appear as the first SPAN inside icon wrappers | ||
| /// and would otherwise win the chat-title race in `parse_active_chat_name`. | ||
| /// | ||
| /// Heuristic: starts with `wds-ic-` / `wds-icon` (WhatsApp Design System | ||
| /// icon prefix), or is a single token with no whitespace whose chars are | ||
| /// all `[a-z0-9_-]` (Material Icon ligature shape). | ||
| fn looks_like_icon_ligature(s: &str) -> bool { | ||
| if s.starts_with("wds-ic-") || s.starts_with("wds-icon") { | ||
| return true; | ||
| } | ||
| !s.is_empty() | ||
| && !s.contains(char::is_whitespace) | ||
| && s.chars() | ||
| .all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '_' || c == '-') | ||
| } |
There was a problem hiding this comment.
The icon heuristic is broad enough to reject real chat titles.
looks_like_icon_ligature() currently treats any single lowercase token as an icon. That means perfectly valid titles like mom, family, anushka, or number-like handles will be skipped before parse_active_chat_name() can return them. Please key this off actual icon markers (wds-ic-, wds-icon, class/parent context, etc.) instead of generic lowercase text.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src-tauri/src/whatsapp_scanner/dom_snapshot.rs` around lines 274 - 290,
The current heuristic in looks_like_icon_ligature is too broad and rejects real
chat titles; change it to only detect icons by explicit markers (keep the
existing checks for "wds-ic-" and "wds-icon" prefixes) and remove the generic
"single lowercase token" rule; instead, rely on DOM context in
parse_active_chat_name (e.g., check element.class names like "material-icons",
"wds-ic", "wds-icon", or parent wrapper classes/attributes that indicate an
icon) and only call looks_like_icon_ligature when those explicit icon markers
are present so normal lowercase titles (e.g., "mom", "family") are not filtered
out.
| fn requestdata_params_omit_index_name() { | ||
| // Regression guard for Bug 1: passing `indexName: ""` to | ||
| // `IndexedDB.requestData` makes CEF 146 reject the call with | ||
| // "Could not get index". The field must be omitted entirely. | ||
| // Same constraint observed in slack_scanner/idb.rs:210-214 and | ||
| // telegram_scanner/idb.rs:210. | ||
| let params = json!({ | ||
| "securityOrigin": "https://web.whatsapp.com", | ||
| "databaseName": "model-storage", | ||
| "objectStoreName": "message", | ||
| "skipCount": 0i64, | ||
| "pageSize": 500i64, | ||
| }); | ||
| assert!( | ||
| params.get("indexName").is_none(), | ||
| "indexName must be omitted entirely - passing empty string is rejected by CEF 146 with 'Could not get index' (see slack_scanner/idb.rs:210-214)" | ||
| ); |
There was a problem hiding this comment.
This regression test never exercises the production request builder.
Right now it only asserts that a hand-written json! literal omits indexName. If read_store() starts sending "indexName": "" again, this test will still stay green. Please build the params through the same helper/path that production uses so the regression is actually covered.
| let (rows, hash, _active_chat_name) = captured?; | ||
| let dom_messages: Vec<Value> = rows.iter().map(dom_snapshot::DomMessage::to_json).collect(); |
There was a problem hiding this comment.
Fast DOM ticks are still dropping the only chat-resolution signal.
Discarding active_chat_name here means the 2s path keeps forwarding bare-ID DOM rows without a chatId, and emit_grouped_whatsapp() will continue to skip them. If fast ticks are meant to stay ingest-capable/live, this result needs the same active-chat stamping as the full-scan path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src-tauri/src/whatsapp_scanner/mod.rs` around lines 354 - 355, The
fast-tick branch is discarding the active_chat_name from the captured tuple (let
(rows, hash, _active_chat_name) = captured?) so DOM rows forwarded on the 2s
path lack a chatId and are skipped by emit_grouped_whatsapp(); update the
handling where captured is unpacked to preserve and apply active_chat_name (use
the existing variable name active_chat_name) — e.g., include active_chat_name
when converting/packaging dom_messages or attach it to each DomMessage JSON
before forwarding to emit_grouped_whatsapp() so fast ticks carry the same chat
stamping as the full-scan path.
| // Resolve the active chat's JID from its display name (parsed from the | ||
| // conversation header). Modern WhatsApp Web doesn't put the chat JID | ||
| // anywhere on individual message rows or in the URL, so this is the | ||
| // only signal we have. The IDB-side `chats` map has `name → jid` (we | ||
| // store it as `jid → {name, …}`, so iterate). Match prefers exact | ||
| // case-sensitive equality and falls back to case-insensitive; ignore | ||
| // ambiguous matches (multiple chats with the same display name) so we | ||
| // don't mis-attribute messages. | ||
| let active_chat_jid: Option<String> = snap.active_chat_name.as_deref().and_then(|name| { | ||
| let name_lc = name.to_ascii_lowercase(); | ||
| let mut exact: Vec<&str> = Vec::new(); | ||
| let mut ci: Vec<&str> = Vec::new(); | ||
| let mut substring: Vec<&str> = Vec::new(); | ||
| for (jid, chat) in snap.chats.iter() { | ||
| let chat_name = chat.get("name").and_then(|v| v.as_str()).unwrap_or(""); | ||
| if chat_name == name { | ||
| exact.push(jid); | ||
| } else if !chat_name.is_empty() && chat_name.to_ascii_lowercase() == name_lc { | ||
| ci.push(jid); | ||
| } else if !chat_name.is_empty() | ||
| && (chat_name.to_ascii_lowercase().contains(&name_lc) | ||
| || name_lc.contains(&chat_name.to_ascii_lowercase())) | ||
| { | ||
| substring.push(jid); | ||
| } | ||
| } | ||
| // Prefer exact > case-insensitive > substring. Substring only wins | ||
| // when there's exactly one candidate (avoids cross-attribution when | ||
| // many chats share a token like a common first name). | ||
| match (exact.len(), ci.len(), substring.len()) { | ||
| (1, _, _) => Some(exact[0].to_string()), | ||
| (0, 1, _) => Some(ci[0].to_string()), | ||
| (0, 0, 1) => Some(substring[0].to_string()), | ||
| _ => None, | ||
| } | ||
| }); | ||
| log::info!( | ||
| "[wa][{}] active chat resolution: name={:?} → jid={:?} chats_in_map={}", | ||
| account_id, | ||
| snap.active_chat_name, | ||
| active_chat_jid, | ||
| snap.chats.len() | ||
| ); |
There was a problem hiding this comment.
active_chat_jid never resolves with the current snap.chats representation.
This block assumes each chat entry is an object with a name field, but scan_once() still stores snap.chats as jid -> Value::String(display_name). On a string value, chat.get("name") is always None, so the resolver never finds a candidate and the later DOM-only chatId stamp still collapses to null. Please either normalize snap.chats to {"name": ...} at ingestion time or read chat.as_str() here (and in the later chat-name lookups).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src-tauri/src/whatsapp_scanner/mod.rs` around lines 493 - 535, The
active_chat_jid resolution fails because the code assumes each chat value is an
object with a "name" field (chat.get("name")) while snap.chats currently stores
jid → Value::String; update the lookup in the active_chat_jid closure to handle
both shapes by extracting the display name via either
chat.get("name").and_then(|v| v.as_str()) or chat.as_str() (falling back from
object → string), and apply the same dual-shape extraction wherever chat names
are checked later; alternatively, normalize snap.chats in scan_once() to store
jid → {"name": ...} so active_chat_jid and subsequent chat-name lookups
(symbols: active_chat_jid, snap.chats, scan_once) see consistent {"name": ...}
objects.
| | Bug | Root cause | Fix | File:line | Verified | | ||
| |-----|-----------|-----|-----------|----------| | ||
| | 1 | `whatsapp_scanner/idb.rs:159` sent `"indexName": ""` to CDP `IndexedDB.requestData`. CEF 146 backend rejects empty-string with `{"code":-32000,"message":"Could not get index"}`. CDP spec says empty string == primary-key index, but the C++ backend requires the field UNSET (omitted entirely). All 4 IDB stores (`message`, `chat`, `contact`, `group-metadata`) failed; scanner emitted zero memory docs; `whatsapp-web:<acct>` namespace stayed empty. | Drop the `"indexName": ""` line from the JSON params. Add a comment block mirroring the working pattern already documented in `slack_scanner/idb.rs:210-214` and `telegram_scanner/idb.rs:210` (both have explicit notes). Slack + Telegram had this fix already; only WhatsApp regressed. | `app/src-tauri/src/whatsapp_scanner/idb.rs:152-167` (1-line deletion + 6-line comment) | ✅ runtime: post-fix log shows `[wa][<acct>] full scan ok messages=20000 chats=2249` (was `0/0` pre-fix). Plus `cargo test --lib whatsapp_scanner` 20/20 (incl. new `requestdata_params_omit_index_name` regression test). | | ||
| | 2 | Once Bug 1 unblocked the IDB walk, `dom_snapshot::parse_rows` still returned 0 — three drift points in WhatsApp Web's HTML had landed since the parser was last touched. (a) `data-id` is no longer `<fromMe>_<chatId>_<msgId>` — it's bare msgId hex (e.g. `AC2E44BDA…`, 32 hex chars). (b) `span.selectable-text` class gone; bodies live in plain `span[dir="ltr|rtl"]` (existing fallback already covers this — only the doc was stale). (c) Active chat JID is no longer in URL, on `data-id`, or on any DOM attribute we could find — only the conversation header carries it. | (a) `split_data_id` accepts both legacy compound and bare-msgId-hex shapes. Bare format returns `(false, "", msg_id)` and the merge in `mod.rs` recovers the missing fields by msgId-tail and active-chat reverse-lookup. (b) Module-level doc comment refreshed to mention both `selectable-text` and `dir` matchers. (c) New `parse_active_chat_name` walks `header[data-testid="conversation-header"]` for the first non-icon `<span>`'s text, skipping `wds-icon` / Material-style ligatures so the chat title wins. | `app/src-tauri/src/whatsapp_scanner/dom_snapshot.rs` (split_data_id rewrite + parse_active_chat_name + looks_like_icon_ligature + 4 new tests) | ✅ runtime: post-fix log shows `[wa][<acct>] full scan ok … dom=N` with N>0 (was `dom=0` pre-fix); active chat name extracted cleanly for 1:1 (`"Anushka"`-shape), group (`"<group title>"`-shape), broadcast (`"<broadcast title>"`-shape). | |
There was a problem hiding this comment.
Escape the | inside the Bug 2 table row.
span[dir="ltr|rtl"] is being parsed as an extra table delimiter here, so the row renders with too many cells and the trailing columns get misaligned/truncated. Escaping that pipe (for example ltr\|rtl) will keep the QA matrix intact.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 73-73: Table column count
Expected: 5; Actual: 6; Too many cells, extra data will be missing
(MD056, table-column-count)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/qa/WHATSAPP-PARITY.md` at line 73, In the QA table row the inline
selector `span[dir="ltr|rtl"]` is being treated as a Markdown table delimiter;
update that cell to escape the pipe (e.g. change `span[dir="ltr|rtl"]` to
`span[dir="ltr\|rtl"]`) so the row renders with the correct number of
columns—look for the exact string `span[dir="ltr|rtl"]` in the row and replace
it with the escaped version.
Summary
main: the IDB walker rejected everyIndexedDB.requestDatacall with"Could not get index"becausewhatsapp_scanner/idb.rs:159was sending an empty-stringindexNamethat current CEF builds (146.x) reject. Slack and Telegram already shipped this exact fix months ago; only WhatsApp regressed.data-idis no longer the legacy<fromMe>_<chatId>_<msgId>triple — it's now bare msgId hex.span.selectable-textis gone (the existingspan[dir]fallback already covers this; only the doc was stale). And the active chat's JID has stopped appearing on the URL, ondata-id, and on any DOM attribute we could find — only the conversation header carries it.chatIdrecovery (active-chat-name extraction fromheader[data-testid="conversation-header"], chats-map reverse lookup, msgId-tail fallback, DOM-onlychatIdstamp).chat_namesgaps (group-metadata id normalize, broadcast store walk, message-envelopepushNamefallback for un-saved contacts) — see "Out of scope" below for the draft. The plumbing in this PR is defensive: when reverse-lookup misses, rows drop with no chatId exactly as before, no regression.Problem
Issue #1017 asks for end-to-end WhatsApp Web parity. Static audit + manual smoke against
pnpm dev:appexposed a chain of three blockers:Bug 1 — IDB walk dead. Every full-tick scan logged
[wa][idb] read message failed: cdp error: {"code":-32000,"message":"Could not get index"}for all four target stores (message,chat,contact,group-metadata), thenfull scan ok messages=0 chats=0 dom=0. RPC queryopenhuman.memory_recall_memories {namespace:"whatsapp-web:<acct>"}returned an empty array. The CDP spec says emptyindexNamemeans "primary key index", but the C++ backend in CEF 146 (Chrome 146.0.7680.165) rejects this; the field has to be omitted entirely. The same fix landed inslack_scanner/idb.rs:210-214andtelegram_scanner/idb.rs:210previously, with explicit comments documenting the trap. WhatsApp drifted because the regression test wasn't there to catch it.Bug 2 — DOM scrape returns zero. Live CDP probe (2026-04-30) revealed three drift points in WhatsApp Web's HTML since
dom_snapshot.rswas last touched: (a)data-idformat changed from"<fromMe>_<chatId>_<msgId>"to bare msgId hex ("AC2E44BDA…", 32 hex chars). The strictsplitn(3, '_')matcher rejected every row. (b)span.selectable-textclass is gone; bodies live in plainspan[dir="ltr|rtl"](the existing fallback matcher handled this; only the module doc was stale). (c) Active chat JID is no longer in URL, ondata-id, or on any DOM attribute we could find — onlyheader[data-testid="conversation-header"]'s first non-icon<span>carries the chat title.Bug 6 (partial) — DOM↔IDB
chatIdcorrelation. Once Bugs 1 + 2 unblocked the data flow, the merge step still producedpatched=0 appended=Nbecause the DOM bare-msgId doesn't match the IDB compound_serializeddirectly (the bare msgId is the trailing segment after the last underscore — close but not exact-match) and DOM-only rows have nochatIdto stamp. This PR plumbs both: a tail-segment fallback in the by-msg-id lookup and an active-chat-jid resolver from the conversation header reverse-looked-up againstsnap.chats. The plumbing is end-to-end runtime-verified for the title extraction and the merge logic; the chats-map gap that preventsSome(jid)resolution for some chat types (un-saved 1:1, broadcast lists, certain group ids) is tracked as a follow-up.Solution
Six GPG-signed micro-commits, ordered trivial → bounded:
fix(webview/whatsapp): omit empty indexName in IndexedDB.requestData (#1017)— drop the line. Mirror the comment from the working sibling scanners. Slack and Telegram had this fix already; only WhatsApp regressed.test(webview/whatsapp): lock the indexName-omission contract for IndexedDB.requestData (#1017)— regression test asserts the JSON payload omitsindexName(so the trap can never silently come back).docs(qa): add WhatsApp Web parity audit matrix (#1017)— initial smoke matrix.fix(webview/whatsapp): adapt DOM scrape to current row + header markup (#1017)— accept both legacy compound and bare-msgIddata-idshapes; newparse_active_chat_namewalks the conversation header for the first non-icon<span>(skippingwds-icon/ Material-style ligatures); module-level doc refreshed.feat(webview/whatsapp): plumb active chat resolution + msgId-tail merge fallback (#1017)—ScanSnapshot.active_chat_name, exact → case-insensitive → substring chats-map reverse lookup, DOM-onlychatIdstamp, by-msg-id lookup that falls back to the trailing segment of the IDB compound id, oneinfo!log per tick recording the resolution outcome.docs(qa): refresh WhatsApp parity matrix with post-Bug-2 + Bug-6-plumbing state (#1017)— verdicts table updated; out-of-scope items reordered to point at the follow-up.Runtime verification (sanitised)
Pre-fix:
Post-Bug-1:
Post-Bug-2 + Bug-6-plumbing:
The
active chat resolutionlog shows the plumbing is in place; thejid=…value depends on whether the IDB chats map already has a name entry for the active chat, which is the gap the follow-up closes.Out of scope (file as separate issues if not already tracked)
chat_namesgaps block end-to-end memory ingest. Three sub-causes documented in.claude/scratch/whatsapp-bug-6-7-followup.md(group-metadata id normalize, broadcast store walk, message-envelopepushNamefallback for un-saved 1:1 contacts). Estimated ~130 LOC acrosswhatsapp_scanner/idb.rs+ tests. To file immediately after this PR opens.<video>element isvideo/mp4(H.264); CEF build lacks proprietary codecs so playback falls back to a download dialog. Build/packaging concern — not a code fix in this repo.web.whatsapp.com) before pinning on OpenHuman vs WhatsApp Web platform limits.Submission Checklist
cargo test --lib whatsapp_scanneris green (20 passed, including newrequestdata_params_omit_index_name,split_data_id_accepts_bare_msg_id, andsplit_data_id_accepts_long_alnum_msg_idregression tests).pnpm dev:appon this branch tip (macOS arm64) walked all 11 acceptance criteria from [Feature] webview: WhatsApp — full end-to-end parity with native app #1017, exercised IDB walk + DOM scrape + chat resolution end-to-end, and captured the diagnostic output documented indocs/qa/WHATSAPP-PARITY.md.whatsapp_scanner/idb.rs(CEF 146 indexName trap with cross-references to slack/telegram counterparts),dom_snapshot.rsmodule-level doc +split_data_iddoc +parse_active_chat_nameinvariants +looks_like_icon_ligatureheuristic,mod.rsScanSnapshot.active_chat_namefield doc + active-chat-jid resolver explanation.Impact
memory_doc_ingestpath is upsert-shaped already, so re-running over old IDB messages is safe.info!log per scan tick (~30s) plus an O(N) walk over the chats map (max ~2.5k entries observed in the wild) — negligible.DOMSnapshot.captureSnapshotcall we were already making; no new injected scripts, no expanded permission grants.Related
chat_namesgaps issue (draft body redacted of personal names + numbers; ready forgh issue create).Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation