Add is_valid to AttrsDict#46
Conversation
…k if AttrsDict is valid for a given timestamp
|
hey ian, i think on() is stupidly slow and can easily sped up. maybe more useful? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #46 +/- ##
==========================================
- Coverage 91.77% 91.18% -0.60%
==========================================
Files 6 6
Lines 547 601 +54
==========================================
+ Hits 502 548 +46
- Misses 45 53 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds validity/file provenance tracking to AttrsDict instances returned by TextDB.on(...) and introduces AttrsDict.is_valid(...) to let callers check whether a different timestamp would resolve to the same underlying file selection (to avoid redundant on() calls).
Changes:
- Pass validity file path + resolved file list into the
AttrsDictcreated byTextDB.on(...). - Extend
AttrsDictto carry__validity_files__/__files__through recursion, mapping, merging, and pickling. - Add
AttrsDict.is_valid(...)and corresponding tests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/test_textdb.py |
Adds assertions covering the new AttrsDict.is_valid(...) behavior. |
src/dbetto/textdb.py |
Constructs the AttrsDict returned by on() with validity/file provenance. |
src/dbetto/attrsdict.py |
Adds provenance fields + is_valid, propagates metadata in operations, updates pickling state. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
From profiling it, reading/parsing the yaml files is what's taking most of the time. I think to speed that part up, maybe switching to json would help (https://stackoverflow.com/questions/46774298/why-is-parsing-yaml-slower-than-an-equivalent-larger-amount-of-json); I think for the very large files like pargen files, human readability matters less and speed matters more. I tested another yaml package (pyyaml12) and found it was about the same as cyaml.
Edit: Actually I misunderstood how |
|
Opened a second pull request to try to actually speed up |
|
Is this still needed if we merge #48? |
|
This is not needed if we merge #48 |
Track validity file and yaml/json files used to create
AttrsDictwhen callingon. Addedis_validto check if a given timestamp would produce the sameAttrsDictso that the user can avoid unnecessary calls toon.