Skip to content

Use TemporaryDirectory while hashing gds files to avoid multi-process race conditions#682

Open
gronniger wants to merge 1 commit intogdsfactory:mainfrom
gronniger:fix_tempdir_gds_hashing
Open

Use TemporaryDirectory while hashing gds files to avoid multi-process race conditions#682
gronniger wants to merge 1 commit intogdsfactory:mainfrom
gronniger:fix_tempdir_gds_hashing

Conversation

@gronniger
Copy link
Contributor

@gronniger gronniger commented Mar 6, 2026

Currently, when hashing components e.g. in gmeep.write_sparameters_meep() and using MPI or other parallel processes, many processes try to write/delete the same gds multiple times, which likely fails.

With this PR, a TemporaryDirectory is created at a randomized location for each process, which makes sure, it can sucessfully finish hashing.

Summary by Sourcery

Bug Fixes:

  • Prevent race conditions when multiple processes hash the same component by writing GDS files into process-specific temporary directories instead of a shared path.

@gronniger gronniger requested a review from joamatab as a code owner March 6, 2026 13:58
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Mar 6, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR changes the component hashing logic to write GDS files into a per-process TemporaryDirectory to avoid concurrent processes reading/writing/deleting the same file, eliminating race conditions in parallel/ MPI use cases.

Sequence diagram for per-process TemporaryDirectory-based component hashing

sequenceDiagram
    actor Process
    participant get_component_hash
    participant TemporaryDirectory
    participant GdsComponent as component
    participant FileSystem

    Process->>get_component_hash: call get_component_hash(component)
    get_component_hash->>TemporaryDirectory: create TemporaryDirectory()
    activate TemporaryDirectory
    get_component_hash->>GdsComponent: write_gds(gdsdir=tmpdir, no_empty_cells=True, with_metadata=False)
    GdsComponent->>FileSystem: create GDS file in tmpdir
    FileSystem-->>GdsComponent: return gdspath
    GdsComponent-->>get_component_hash: gdspath
    get_component_hash->>FileSystem: read_bytes(gdspath)
    FileSystem-->>get_component_hash: GDS bytes
    get_component_hash->>get_component_hash: hashlib.md5(bytes).hexdigest()
    TemporaryDirectory-->>FileSystem: cleanup tmpdir and GDS file
    deactivate TemporaryDirectory
    get_component_hash-->>Process: return component_hash
Loading

Flow diagram for updated component hashing with TemporaryDirectory

flowchart TD
    A[Start get_component_hash] --> B[Create TemporaryDirectory tmpdir]
    B --> C[Call component.write_gds
    gdsdir=tmpdir
    no_empty_cells=True
    with_metadata=False]
    C --> D[Receive gdspath in tmpdir]
    D --> E[Read GDS bytes from gdspath]
    E --> F["Compute h = hashlib.md5(bytes).hexdigest()"]
    F --> G[Exit TemporaryDirectory context
    tmpdir and GDS file removed]
    G --> H[Return h]
    H --> I[End]
Loading

File-Level Changes

Change Details Files
Hash component GDS files using a per-call TemporaryDirectory to avoid multi-process file conflicts.
  • Wrap GDS export and hashing logic in a TemporaryDirectory context manager.
  • Pass the TemporaryDirectory path to component.write_gds via the gdsdir argument while preserving existing options.
  • Compute the MD5 hash from the temporary GDS file bytes and rely on the TemporaryDirectory cleanup instead of manual file deletion.
gplugins/common/utils/get_sparameters_path.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Using TemporaryDirectory solves the race, but creating a new directory (and writing a fresh GDS) for every hash may be expensive in hot paths; consider whether you can reuse a per-process temp directory or an in‑memory representation if this hashing is called very frequently.
  • It might be worth clarifying (e.g., in a code comment near get_component_hash) that the hash depends solely on the GDS bytes and not the temp directory path, to make it clear to future readers that changing the output location does not affect determinism.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Using TemporaryDirectory solves the race, but creating a new directory (and writing a fresh GDS) for every hash may be expensive in hot paths; consider whether you can reuse a per-process temp directory or an in‑memory representation if this hashing is called very frequently.
- It might be worth clarifying (e.g., in a code comment near get_component_hash) that the hash depends solely on the GDS bytes and not the temp directory path, to make it clear to future readers that changing the output location does not affect determinism.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant