Skip to content

Conversation

@sjawhar
Copy link
Contributor

@sjawhar sjawhar commented Jan 2, 2026

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Overview

Adds a new fast_yaml opt-in config that uses PyYAML's C parser for code paths that don't need round-trip (e.g. to preserve comments). Makes things significantly faster.

Mostly LLM-generated, especially the extra stuff to make the C Parser respect YAML 1.2 🀣 . Could drop that extra stuff, since it's opt-in, switch to a different, fast parser, or just keep it?

@github-project-automation github-project-automation bot moved this to Backlog in DVC Jan 2, 2026
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

❌ Patch coverage is 81.69014% with 39 lines in your changes missing coverage. Please review.
βœ… Project coverage is 90.93%. Comparing base (2431ec6) to head (c105460).
⚠️ Report is 175 commits behind head on main.

Files with missing lines Patch % Lines
dvc/dvcfile.py 23.33% 22 Missing and 1 partial ⚠️
dvc/utils/serialize/_yaml.py 84.48% 6 Missing and 3 partials ⚠️
dvc/stage/cache.py 57.14% 2 Missing and 1 partial ⚠️
dvc/utils/serialize/__init__.py 66.66% 2 Missing ⚠️
tests/unit/utils/serialize/test_yaml.py 97.77% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10942      +/-   ##
==========================================
+ Coverage   90.68%   90.93%   +0.24%     
==========================================
  Files         504      505       +1     
  Lines       39795    41220    +1425     
  Branches     3141     3257     +116     
==========================================
+ Hits        36087    37482    +1395     
- Misses       3042     3095      +53     
+ Partials      666      643      -23     

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@skshetry
Copy link
Collaborator

skshetry commented Jan 2, 2026

ruamel.yaml has a C-parser, which is 1.2 compliant. See https://yaml.dev/doc/ruamel.yaml/basicuse/#top.

You can get it using YAML(typ="safe").

Could drop that extra stuff, since it's opt-in, switch to a different, fast parser, or just keep it?

Let’s keep it simple and behind a feature-flag. This feels like a step backward (moving from 1.2 -> 1.1), and I’m fine explicitly calling it experimental. I don’t expect this to ever become stable, it's just an escape hatch when things are slow. We should document that it’s YAML 1.1 and may have compatibility issues.
(If you are using this flag, you know what you are doing).

@sjawhar
Copy link
Contributor Author

sjawhar commented Jan 2, 2026

Some benchmark numbers on a real dvc.lock file:

Parser Mean (ms) Ratio
Custom YAML12SafeLoader 7.51 1.0x
PyYAML CSafeLoader 7.55 1.0x
ruamel CSafeLoader 10.74 1.4x
ruamel typ=safe 17.04 2.3x
yamliumΒ (after some fixes) 46.11 6.1x
ruamel typ=rt 134.67 17.9x

I'd love to have a 1.2-compatible, fast parser as an option, and the ~30% speedup over ruamel.yaml could be worth it for a little bit of code, no? You're the boss, though :)

with modify_yaml(self.path, fs=self.repo.fs) as data:
if not data:
data.update({"schema": "2.0"})
# order is important, meta should always be at the top
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: restore comments

@sjawhar
Copy link
Contributor Author

sjawhar commented Jan 2, 2026

I'd love to have a 1.2-compatible, fast parser as an option

I realized my comment here was a bit silly, since I doubt the few patches in this PR are enough to make it 1.2-compatible πŸ˜…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants