Add is_valid to AttrsDict by iguinn · Pull Request #46 · gipert/dbetto

iguinn · 2026-02-13T22:51:10Z

Track validity file and yaml/json files used to create AttrsDict when calling on. Added is_valid to check if a given timestamp would produce the same AttrsDict so that the user can avoid unnecessary calls to on.

…k if AttrsDict is valid for a given timestamp

gipert · 2026-02-14T09:18:17Z

hey ian, i think on() is stupidly slow and can easily sped up. maybe more useful?

codecov · 2026-02-14T09:19:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.18%. Comparing base (09ec900) to head (60f721c).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #46      +/-   ##
==========================================
- Coverage   91.77%   91.18%   -0.60%     
==========================================
  Files           6        6              
  Lines         547      601      +54     
==========================================
+ Hits          502      548      +46     
- Misses         45       53       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds validity/file provenance tracking to AttrsDict instances returned by TextDB.on(...) and introduces AttrsDict.is_valid(...) to let callers check whether a different timestamp would resolve to the same underlying file selection (to avoid redundant on() calls).

Changes:

Pass validity file path + resolved file list into the AttrsDict created by TextDB.on(...).
Extend AttrsDict to carry __validity_files__ / __files__ through recursion, mapping, merging, and pickling.
Add AttrsDict.is_valid(...) and corresponding tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`tests/test_textdb.py`	Adds assertions covering the new `AttrsDict.is_valid(...)` behavior.
`src/dbetto/textdb.py`	Constructs the `AttrsDict` returned by `on()` with validity/file provenance.
`src/dbetto/attrsdict.py`	Adds provenance fields + `is_valid`, propagates metadata in operations, updates pickling state.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

iguinn · 2026-02-15T19:59:08Z

hey ian, i think on() is stupidly slow and can easily sped up. maybe more useful?

From profiling it, reading/parsing the yaml files is what's taking most of the time. I think to speed that part up, maybe switching to json would help (https://stackoverflow.com/questions/46774298/why-is-parsing-yaml-slower-than-an-equivalent-larger-amount-of-json); I think for the very large files like pargen files, human readability matters less and speed matters more. I tested another yaml package (pyyaml12) and found it was about the same as cyaml.

Otherwise, reducing the number of reads is the only way to speed it up. That was the idea behind this approach, but I agree there could be better approaches by caching the files read by on so that you don't need to reread the files when you re-call on, which would be both faster and easier to use than this is_valid approach...I can think of a few ways you could do this
1) Every time a file is read cache it; I think this would be the fastest since you would never have to reread a file again, but it could run into memory issues (e.g. pargen fully replaces the file contents each run, and would get quite large if you were to scan through the full dataset)
2) Every time you call on, cache only the files that were used, and clear any that weren't. This would work quite well for usage patterns that are scanning through metadata sequentially and would avoid the memory problems since you are typically replacing the large files meaning they would get cleared. There might be some usage patterns where this is slow, though (e.g., if someone is alternating between cal and phy and they have different file histories or something)
3) Set some other limit on cached files that causes them to clear at some point...Maybe if it isn't used for 5 calls of on you clear it, or something like that...This would be a little more annoying to implement though

~~I'll maybe give approach 2 a shot, shouldn't actually be that hard, and since it would keep this all under the hood we could change it later without causing problems.~~

Edit: Actually I misunderstood how on was working, #1 is already what's happening implicitly, so I tried profiling again and it's deepcopy, called in Props.add_to that is taking the bulk of the time on repeated reads (and yaml is the bulk of the time on the first read). I tried removing the deepcopy (side-effects be damned), and the limiting steps seem to be Props.subst_vars and a lot of the path arithmetic in TextDB.__getitem__.

iguinn · 2026-02-16T03:39:26Z

Opened a second pull request to try to actually speed up on here #48. However, that pull request may be a little risky as it reverts a few features related to wildcards and environment variables and such, and it is potentially less safe.

gipert · 2026-03-11T08:58:19Z

Is this still needed if we merge #48?

iguinn · 2026-04-28T22:04:27Z

This is not needed if we merge #48

Track validity for AttrsDicts created with on. Added is_valid to chec…

6ad12e9

…k if AttrsDict is valid for a given timestamp

gipert requested a review from Copilot February 14, 2026 09:17

Copilot started reviewing on behalf of gipert February 14, 2026 09:18 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

Comment thread src/dbetto/textdb.py

Comment thread src/dbetto/attrsdict.py Outdated

Comment thread src/dbetto/attrsdict.py Outdated

iguinn mentioned this pull request Feb 16, 2026

Speed up on #48

Merged

iguinn and others added 3 commits February 15, 2026 19:35

Respond to the bot and improve code coverage

ca9a61f

style: pre-commit fixes

cd6f19e

Merge branch 'main' into main

6a98643

iguinn and others added 4 commits February 24, 2026 12:04

Cache most recent result of call to TextDB.on

37e551f

style: pre-commit fixes

61bcaf7

Fix pickling bug

015d2ab

Test for and fix pickling bug when calling on

60f721c

gipert closed this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add is_valid to AttrsDict#46

Add is_valid to AttrsDict#46
iguinn wants to merge 8 commits into
gipert:mainfrom
iguinn:main

iguinn commented Feb 13, 2026

Uh oh!

gipert commented Feb 14, 2026

Uh oh!

codecov Bot commented Feb 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iguinn commented Feb 15, 2026 •

edited

Loading

Uh oh!

iguinn commented Feb 16, 2026

Uh oh!

gipert commented Mar 11, 2026

Uh oh!

iguinn commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iguinn commented Feb 13, 2026

Uh oh!

gipert commented Feb 14, 2026

Uh oh!

codecov Bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iguinn commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iguinn commented Feb 16, 2026

Uh oh!

gipert commented Mar 11, 2026

Uh oh!

iguinn commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Feb 14, 2026 •

edited

Loading

iguinn commented Feb 15, 2026 •

edited

Loading