Skip to content

Email fetch with --download-attachments silently drops attachments when multiple exist #2078

@chubes4

Description

@chubes4

Summary

When an IMAP message has multiple attachments, wp datamachine email fetch --download-attachments only writes a subset to disk. The IMAP message itself correctly reports attachment_count: 2 in the metadata, but only one file actually lands in wp-content/uploads/datamachine-files/email-attachments/.

Reproducer

On extrachill.com production:

wp --allow-root --path=/var/www/extrachill.com datamachine email read 87801 --format=json

Metadata reports attachment_count: 2 and has_attachments: true.

rm wp-content/uploads/datamachine-files/email-attachments/*
wp --allow-root --path=/var/www/extrachill.com datamachine email fetch \
  --search='FROM "chrisgardner" SUBJECT "dani rucker"' \
  --max=5 --download-attachments
ls wp-content/uploads/datamachine-files/email-attachments/

Result: only one file (corey-campbell-dani-rucker-1.transcript.txt) is written. The second attachment is silently missing — no warning, no log entry.

Expected

Both attachments are saved to disk, with deterministic distinct filenames. If a filename collision is unavoidable (e.g. both attachments named the same), the second should be suffixed (e.g. -2) rather than silently dropped or silently overwritten.

Observed

  • 1 of 2 attachments written
  • No warning surfaced by the CLI command
  • No ERROR or WARNING row in wp datamachine logs read
  • The downloaded filename (...-1.transcript.txt) implies the handler is aware of multi-attachment semantics — the -1 suffix is suggestive of a \$n++ counter that should be incrementing past 1

Hypotheses (in order of likely culprit)

  1. Filename collision with silent overwrite. The two source attachments may share the same MIME-derived or message-id-derived name, and the writer overwrites instead of suffixing. Check the attachment-write loop for any file_put_contents() without first checking file_exists().
  2. Loop break on first success. The IMAP body part iteration may be returning after the first attachment is written instead of continuing through all parts.
  3. MIME-part filter. The handler may be filtering on a specific content-type or content-disposition and one of the attachments is being filtered out (e.g. inline vs. attachment disposition).

Where to look

inc/Core/Email/ or wherever the email fetch --download-attachments path lives. Search for the attachment write loop and confirm:

  • It iterates all MIME parts of type attachment, not just the first
  • It writes each part to a unique path (suffix-on-collision or hash-based naming)
  • It logs each attachment write so missing ones are visible after-the-fact

Impact

  • Data loss on ingest. Workflows that depend on multi-attachment email parsing (e.g. interview transcripts that come as N audio files, photo sets, multi-page scans) lose data without warning.
  • Discovered while ingesting an interview transcript email from Chris Gardner — email reported 2 attachments, only 1 landed.

Acceptance criteria

  • Reproducer above results in 2 distinct files on disk
  • Filename collisions are resolved with a suffix scheme, never silent overwrite
  • A WARNING is logged if any attachment is skipped for a non-error reason (e.g. excluded MIME type)
  • CLI output prints the count of attachments written and the count expected, so silent loss is impossible to miss

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions