Use TemporaryDirectory while hashing gds files to avoid multi-process race conditions#682
Open
gronniger wants to merge 1 commit intogdsfactory:mainfrom
Open
Use TemporaryDirectory while hashing gds files to avoid multi-process race conditions#682gronniger wants to merge 1 commit intogdsfactory:mainfrom
gronniger wants to merge 1 commit intogdsfactory:mainfrom
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR changes the component hashing logic to write GDS files into a per-process TemporaryDirectory to avoid concurrent processes reading/writing/deleting the same file, eliminating race conditions in parallel/ MPI use cases. Sequence diagram for per-process TemporaryDirectory-based component hashingsequenceDiagram
actor Process
participant get_component_hash
participant TemporaryDirectory
participant GdsComponent as component
participant FileSystem
Process->>get_component_hash: call get_component_hash(component)
get_component_hash->>TemporaryDirectory: create TemporaryDirectory()
activate TemporaryDirectory
get_component_hash->>GdsComponent: write_gds(gdsdir=tmpdir, no_empty_cells=True, with_metadata=False)
GdsComponent->>FileSystem: create GDS file in tmpdir
FileSystem-->>GdsComponent: return gdspath
GdsComponent-->>get_component_hash: gdspath
get_component_hash->>FileSystem: read_bytes(gdspath)
FileSystem-->>get_component_hash: GDS bytes
get_component_hash->>get_component_hash: hashlib.md5(bytes).hexdigest()
TemporaryDirectory-->>FileSystem: cleanup tmpdir and GDS file
deactivate TemporaryDirectory
get_component_hash-->>Process: return component_hash
Flow diagram for updated component hashing with TemporaryDirectoryflowchart TD
A[Start get_component_hash] --> B[Create TemporaryDirectory tmpdir]
B --> C[Call component.write_gds
gdsdir=tmpdir
no_empty_cells=True
with_metadata=False]
C --> D[Receive gdspath in tmpdir]
D --> E[Read GDS bytes from gdspath]
E --> F["Compute h = hashlib.md5(bytes).hexdigest()"]
F --> G[Exit TemporaryDirectory context
tmpdir and GDS file removed]
G --> H[Return h]
H --> I[End]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Using TemporaryDirectory solves the race, but creating a new directory (and writing a fresh GDS) for every hash may be expensive in hot paths; consider whether you can reuse a per-process temp directory or an in‑memory representation if this hashing is called very frequently.
- It might be worth clarifying (e.g., in a code comment near get_component_hash) that the hash depends solely on the GDS bytes and not the temp directory path, to make it clear to future readers that changing the output location does not affect determinism.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Using TemporaryDirectory solves the race, but creating a new directory (and writing a fresh GDS) for every hash may be expensive in hot paths; consider whether you can reuse a per-process temp directory or an in‑memory representation if this hashing is called very frequently.
- It might be worth clarifying (e.g., in a code comment near get_component_hash) that the hash depends solely on the GDS bytes and not the temp directory path, to make it clear to future readers that changing the output location does not affect determinism.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, when hashing components e.g. in
gmeep.write_sparameters_meep()and using MPI or other parallel processes, many processes try to write/delete the same gds multiple times, which likely fails.With this PR, a
TemporaryDirectoryis created at a randomized location for each process, which makes sure, it can sucessfully finish hashing.Summary by Sourcery
Bug Fixes: