Fix sidecar metadata lookup to use stable image path keys#54
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughRefactors metadata keying from filename stems to full path-based keys across the app: SidecarManager now resolves stable path keys with migration fallbacks; AppController, ThumbnailModel, folder_stats, and tests updated to use path-aware metadata lookups; logging_setup got resilient app-data dir selection. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as "ThumbnailModel / UI"
participant Controller as "AppController"
participant Sidecar as "SidecarManager"
participant Disk as "Sidecar JSON Storage"
UI->>Controller: request metadata for image (Path)
Controller->>Sidecar: get_metadata(image_path, create=False)
Sidecar->>Sidecar: _lookup_keys(image_path) -> (stable_key, candidates)
alt existing stable_key in memory
Sidecar-->>Controller: metadata dict
else try candidate keys / migrate
Sidecar->>Disk: read entries by candidate key(s)
Disk-->>Sidecar: entry (legacy key)
Sidecar->>Disk: migrate/save under stable_key
Sidecar-->>Controller: migrated metadata dict
end
Controller-->>UI: metadata dict (path-keyed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
faststack/thumbnail_view/folder_stats.py (1)
227-227: Minor: Docstring is slightly outdated.The docstring says
entries: Dict of {stem: metadata}but entries can now also contain filename keys (e.g.,"a.jpg"). Consider updating to reflect both key formats.📝 Suggested docstring update
- entries: Dict of {stem: metadata} from faststack.json + entries: Dict of {stem_or_filename: metadata} from faststack.json🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@faststack/thumbnail_view/folder_stats.py` at line 227, Update the docstring for the "entries" parameter in faststack/thumbnail_view/folder_stats.py to reflect that keys may be either stems or full filenames (e.g., "a.jpg") rather than only stems; locate the docstring that currently reads "entries: Dict of {stem: metadata}" (the "entries" parameter description) and change it to describe both accepted key formats and an example to make the behavior clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@faststack/app.py`:
- Around line 2161-2165: The bulk metadata map in _get_bulk_metadata_map() uses
normalize_path_key(img.path) as the key but sidecar stores entries under
SidecarManager.metadata_key_for_path(path); change the map key to use
SidecarManager.metadata_key_for_path(img.path) (or the instance helper on
self.sidecar if available) when assigning bulk_map[...] so downstream lookups
use the same canonical sidecar keys and metadata lookups no longer miss entries.
In `@faststack/io/sidecar.py`:
- Around line 204-205: The guard is missing for empty image_ref which causes
_lookup_keys("") to return "" and get_metadata(create=True) to persist
self.data.entries[""]; update the code paths in _lookup_keys and callers
(notably where get_metadata(create=True) is invoked) to explicitly treat empty
or falsy image_ref as absent: if value is empty, return None or an empty list
from _lookup_keys and skip calling get_metadata(create=True) (or call with
create=False) so no metadata entry with an empty-string key is created; apply
the same check to the other occurrence around lines 175-178 to prevent
persisting self.data.entries[""].
- Around line 185-187: Normalize path case when deriving sidecar keys to avoid
duplicate entries on case-insensitive filesystems: after computing base_dir and
abs_path (the variables used when deriving the key) convert their canonical
string forms to a stable case (use .casefold() or .lower() on the normalized
absolute path string) before computing the relative/key value; apply the same
normalization to the other occurrence where abs_path/base_dir are used (the
second block around the variables at the other occurrence noted) so both key
derivations use case-folded, separator-normalized absolute paths.
In `@faststack/logging_setup.py`:
- Around line 15-18: The probe uses a fixed filename (probe = path /
".write_test") which can clobber files and race under concurrent startups;
replace that with creation of a unique temp file in the target directory (use
tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=path) and write to that
temp file instead of probe.open, then remove it (matching the current
probe.unlink step). Update references to probe.open and probe.unlink to operate
on the generated temp path/filename and ensure the temp file is closed before
unlinking.
- Around line 57-59: The current fallback returns an arbitrary path
(candidates[0] or Path.cwd()/ "var"/"appdata") which may be unwritable and cause
setup_logging() to fail when it calls mkdir(); update the fallback to guarantee
a writable location (e.g., use the system temp directory via
tempfile.gettempdir() wrapped in Path and a subdir) or alternatively catch
OSError around the mkdir()/file handler creation in setup_logging() and fall
back to a safe writable location; locate the code that computes the candidates
list and the return expression (the function building the log directory and the
setup_logging() function) and implement either the writable-tempdir fallback or
the mkdir/error-handling fallback so logging initialization cannot crash the
app.
In `@faststack/thumbnail_view/model.py`:
- Around line 551-553: The metadata lookup in thumbnail_view/model.py uses only
metadata_map.get(normalize_path_key(img.path)) (and _get_metadata fallback)
which is incompatible with folder_stats which uses entries.get(filename) and
entries.get(stem); update the lookup to try normalized-path first then fall back
to filename and stem keys (or vice versa depending on precedence) so metadata
produced with filename/stem keys is not lost; apply the same change to the other
occurrence around lines 613-614 and consider extracting a small helper (e.g.,
resolve_metadata_key) to centralize lookups using normalize_path_key, filename,
and stem against metadata_map and to preserve current _get_metadata behavior.
---
Nitpick comments:
In `@faststack/thumbnail_view/folder_stats.py`:
- Line 227: Update the docstring for the "entries" parameter in
faststack/thumbnail_view/folder_stats.py to reflect that keys may be either
stems or full filenames (e.g., "a.jpg") rather than only stems; locate the
docstring that currently reads "entries: Dict of {stem: metadata}" (the
"entries" parameter description) and change it to describe both accepted key
formats and an example to make the behavior clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8cec2da3-b271-484f-bff0-2ac78612c5cc
📒 Files selected for processing (8)
faststack/app.pyfaststack/io/sidecar.pyfaststack/logging_setup.pyfaststack/tests/test_jump_to_last_uploaded.pyfaststack/tests/test_sidecar.pyfaststack/tests/thumbnail_view/test_folder_stats.pyfaststack/thumbnail_view/folder_stats.pyfaststack/thumbnail_view/model.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c34d0bdd6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if os.path.sep in key or "/" in key or "\\" in key or "." in Path(key).name: | ||
| return self.metadata_key_for_path(Path(key)) |
There was a problem hiding this comment.
Preserve dotted stable keys during key normalization
The new _stable_key_from_key logic treats any key containing . as a filename and re-normalizes it via metadata_key_for_path, which strips everything after the last dot. That breaks valid stable keys for files whose stem itself contains dots (for example, metadata stored under my.photo for my.photo.jpg): a later lookup for my.jpg computes stable key my, then this loop incorrectly matches my.photo and migrates/deletes it, causing metadata to be returned for the wrong image and corrupting sidecar entries.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed. This can corrupt sidecar entries for dotted stems by treating valid stable keys as filenames. I’ll preserve
canonical dotted keys and only re-normalize actual path-like legacy keys.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
faststack/io/sidecar.py (2)
228-232: Filesystem existence check on every lookup may add latency.
candidate_path.exists()performs I/O on every call to_lookup_keysfor plain string refs that don't contain path separators or known extensions. This could add noticeable latency if called frequently with non-existent stems.Since this is part of the fallback/migration logic, the impact may be acceptable. Consider documenting this behavior or caching the check if profiling shows issues.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@faststack/io/sidecar.py` around lines 228 - 232, _lookup_keys currently calls candidate_path.exists() on every plain-string lookup which causes filesystem I/O and latency; either document this fallback behavior or add a cache to avoid repeated exists() checks. Modify _lookup_keys to consult a simple in-memory LRU/TTL cache keyed by the candidate_path (or the original value) before calling candidate_path.exists(), or add a comment in the function explaining this I/O-heavy fallback is intentional for migration. Ensure you reference candidate_path.exists(), metadata_key_for_path and _metadata_filename_key when updating the code or docs so reviewers can find the changed behavior.
171-179: Linear scan over all entries may impact performance for large sidecars.The second migration pass iterates over all entries (O(n)) to find legacy keys. For large sidecars with thousands of entries, this could slow down metadata lookups. Consider adding an early exit condition or caching migrated keys to avoid repeated scans.
Since this path only triggers when the stable key isn't found AND no candidate keys match, it should be rare after initial migration. If performance becomes an issue, consider building an inverted index during
load()or tracking migrated keys to skip this scan.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@faststack/io/sidecar.py` around lines 171 - 179, The linear scan in the fallback migration loop (iterating self.data.entries to find legacy keys when stable_key is missing) can degrade performance for large sidecars; modify the migration logic in the method containing _stable_key_from_key and the loop so it avoids repeated O(n) scans by either (A) adding an early-exit and marking this stable_key as "not found" in a small migrated_keys cache to skip future scans, or (B) building an inverted index (map legacy_key_prefix -> stable_key) during load() and using that index to look up candidate entries instead of scanning all entries; update the code that currently assigns self.data.entries[stable_key] and deletes existing_key to consult/update the cache/index so subsequent lookups are O(1).faststack/logging_setup.py (1)
60-61: Silent fallback to temp directory may lose user data without warning.When
get_app_data_dir()falls back totempfile.gettempdir(), the config file (stored viaAppConfiginfaststack/config.py) will be written to a temporary location. Changes will be lost on restart without any visible warning to the user.Consider logging a warning when the temp fallback is used, so users and developers are aware that persistent storage is unavailable.
Proposed fix
# Final fallback: system temp is the most reliable writable location. + log.warning( + "No writable app-data directory found; falling back to temp directory. " + "Configuration and logs may not persist across restarts." + ) return Path(tempfile.gettempdir()) / "faststack"Note: This requires adding
log = logging.getLogger(__name__)near the top of the file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@faststack/logging_setup.py` around lines 60 - 61, The get_app_data_dir() function currently falls back silently to Path(tempfile.gettempdir()) / "faststack", which can cause AppConfig writes to be lost; modify get_app_data_dir to obtain a module logger (log = logging.getLogger(__name__)) and emit a warning-level log message when the tempdir fallback is chosen (include the chosen path in the message) so callers like AppConfig in faststack/config.py are alerted that persistent storage is unavailable.faststack/tests/test_sidecar.py (1)
251-268: Consider adding a test forPath("")input.The learnings mention that
Path("")should also be guarded vianot image_ref.namein_lookup_keys. Adding a parallel test forPath("")would verify this edge case is covered.Proposed test to add
def test_empty_path_create_false_returns_none(mock_sidecar_dir): """Path('') should also be treated as empty.""" d = mock_sidecar_dir() sm = SidecarManager(d, None) assert sm.get_metadata(Path(""), create=False) is None assert "" not in sm.data.entries def test_empty_path_create_true_raises(mock_sidecar_dir): """Path('') should fail fast when asked to create metadata.""" d = mock_sidecar_dir() sm = SidecarManager(d, None) with pytest.raises(ValueError, match="image_ref must not be empty"): sm.get_metadata(Path(""), create=True) assert "" not in sm.data.entries🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@faststack/tests/test_sidecar.py` around lines 251 - 268, Add tests that verify Path("") is treated the same as an empty string: extend the existing test pair for empty-string refs by adding two new tests that call SidecarManager.get_metadata with Path("") (one with create=False expecting None and one with create=True expecting ValueError with "image_ref must not be empty") and assert "" is not present in sm.data.entries; this ensures the _lookup_keys logic (use of not image_ref.name) handles Path inputs as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@faststack/logging_setup.py`:
- Around line 71-75: The fallback mkdir call can still raise and crash startup;
wrap both the initial log_dir.mkdir(...) and the fallback tempfile.gettempdir()
/ "faststack" / "logs" mkdir(...) in try/except so any OSError results in
setting log_dir (or log_file) to None instead of letting the exception
propagate, and then guard creation of the RotatingFileHandler (and any
file-based handlers) with an if log_file (or if log_dir) check so file logging
is skipped safely when directories cannot be created.
---
Nitpick comments:
In `@faststack/io/sidecar.py`:
- Around line 228-232: _lookup_keys currently calls candidate_path.exists() on
every plain-string lookup which causes filesystem I/O and latency; either
document this fallback behavior or add a cache to avoid repeated exists()
checks. Modify _lookup_keys to consult a simple in-memory LRU/TTL cache keyed by
the candidate_path (or the original value) before calling
candidate_path.exists(), or add a comment in the function explaining this
I/O-heavy fallback is intentional for migration. Ensure you reference
candidate_path.exists(), metadata_key_for_path and _metadata_filename_key when
updating the code or docs so reviewers can find the changed behavior.
- Around line 171-179: The linear scan in the fallback migration loop (iterating
self.data.entries to find legacy keys when stable_key is missing) can degrade
performance for large sidecars; modify the migration logic in the method
containing _stable_key_from_key and the loop so it avoids repeated O(n) scans by
either (A) adding an early-exit and marking this stable_key as "not found" in a
small migrated_keys cache to skip future scans, or (B) building an inverted
index (map legacy_key_prefix -> stable_key) during load() and using that index
to look up candidate entries instead of scanning all entries; update the code
that currently assigns self.data.entries[stable_key] and deletes existing_key to
consult/update the cache/index so subsequent lookups are O(1).
In `@faststack/logging_setup.py`:
- Around line 60-61: The get_app_data_dir() function currently falls back
silently to Path(tempfile.gettempdir()) / "faststack", which can cause AppConfig
writes to be lost; modify get_app_data_dir to obtain a module logger (log =
logging.getLogger(__name__)) and emit a warning-level log message when the
tempdir fallback is chosen (include the chosen path in the message) so callers
like AppConfig in faststack/config.py are alerted that persistent storage is
unavailable.
In `@faststack/tests/test_sidecar.py`:
- Around line 251-268: Add tests that verify Path("") is treated the same as an
empty string: extend the existing test pair for empty-string refs by adding two
new tests that call SidecarManager.get_metadata with Path("") (one with
create=False expecting None and one with create=True expecting ValueError with
"image_ref must not be empty") and assert "" is not present in sm.data.entries;
this ensures the _lookup_keys logic (use of not image_ref.name) handles Path
inputs as well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4dbd6823-6138-4ff8-9915-f462e38ca02b
📒 Files selected for processing (3)
faststack/io/sidecar.pyfaststack/logging_setup.pyfaststack/tests/test_sidecar.py
|
Addressed the remaining review feedback on PR #54, including the dotted-key sidecar fix, safer logging fallback, and |
|
Now merged into main. |
What
This PR separates the sidecar metadata path-key fix into its own standalone change.
Why
Keeping this separate makes the review smaller, clearer, and avoids mixing it with the unrelated Windows startup/turbojpeg work.
Impact
Testing
Summary by CodeRabbit
Bug Fixes
Tests
Chores