feat(units): add an expression parser#173
Merged
Luthaf merged 44 commits intometatensor:mainfrom Mar 18, 2026
Merged
Conversation
92bf24c to
0fe0ef7
Compare
Contributor
|
Thanks a lot! I think it would be better if we can use the new functionality to check if the |
GardevoirX
reviewed
Mar 4, 2026
HaoZeke
added a commit
to HaoZeke/metatomic
that referenced
this pull request
Mar 4, 2026
Address PR metatensor#173 review feedback from GardevoirX: - Add s, second, ms, us, ns, ps with full-word aliases to time tokens - Add tests verifying ModelOutput rejects mismatched quantity/unit dims - Add tests for standalone micro sign (U+00B5) -> Dalton resolution - Update docs token table and doxygen with new time unit coverage - Fix stray dash in RST list-table Dimensionless row
Luthaf
reviewed
Mar 9, 2026
Replace the old string-matching unit conversion with a Shunting-Yard expression parser that supports *, /, ^, and parentheses. Unit code extracted to units.hpp/cpp per review feedback. Micro sign handling uses explicit base_units entries instead of global normalization.
Remove standalone micro sign test (no longer normalizes globally),
add micro sign microsecond tests instead. Update error message
assertions ("unknown unit" not "unknown unit token"). Add comment
explaining why models.cpp test uses valid quantity/unit strings.
Move full unit expression documentation to the Doxygen comment on unit_conversion_factor in units.hpp (renders via autofunction). Remove redundant standalone sections from misc.rst, keeping only the known-quantities table and deprecation note.
01198d7 to
eb8e28a
Compare
Remove validate_unit from CHANGELOG (not public API). Move unit_conversion_factor docstring from documentation.py to the Python function in __init__.py (documentation.py is only for C++ ops).
to_lower() now skips non-ASCII bytes, preventing macOS locale from mangling UTF-8 micro sign (0xB5 -> 'u'). Restore unit_conversion_factor in documentation.py since __init__.py imports it for Sphinx builds.
Luthaf
reviewed
Mar 10, 2026
|
|
||
|
|
||
| def unit_conversion_factor(quantity: str, from_unit: str, to_unit: str): | ||
| def unit_conversion_factor(from_unit: str, to_unit: str) -> float: |
Member
There was a problem hiding this comment.
This does not need to be in this file if we re-define a Python function anyway
Member
Author
There was a problem hiding this comment.
Still needed for the Sphinx/Type checking thing, probably could be handled better by refactoring the sphinx imports but maybe not here?
- Add cross-ref link in misc.rst docs to C++ API reference - Add [[deprecated]] attribute to 3-arg unit_conversion_factor overload - Remove units.hpp include from model.hpp, add direct includes in .cpp - Clean up AST evaluator comment (remove lumol reference) - Simplify test comment in models.cpp - Suppress deprecated warnings in register.cpp and unit tests
Use C++ [[deprecated]] attribute as requested by review. Pragma guards are isolated in wrapper functions (register.cpp, tests/units.cpp) to keep call sites clean.
Expose the 2-arg C++ op directly as metatomic.torch.unit_conversion_factor instead of a *args/**kwargs Python wrapper. This allows calling it from TorchScript models (e.g. inside AtomisticModel). The deprecated 3-arg form remains available via the C++ op torch.ops.metatomic.unit_conversion_factor for backward compat. Add TorchScript compatibility test. Update Python tests to call the 3-arg C++ op directly. Drop unnecessary comment in models.cpp.
…model.py - Add units.hpp to torch.hpp so C++ tests find the symbol - Replace torch.ops.metatomic.unit_conversion_factor_v2 calls in model.py with the module-level unit_conversion_factor (per review request)
5fd5187 to
6588fa1
Compare
Wrap long line in test_conversion_length_3arg to satisfy ruff. style: remove unused warnings import in units test Fixes ruff F401 (unused import) and I001 (unsorted imports). style: fix ruff import sorting in units test Ruff expects a blank line after imports. Update metatomic-torch/src/units.cpp Co-authored-by: Guillaume Fraux <luthaf@luthaf.fr> Update metatomic-torch/src/units.cpp Co-authored-by: Guillaume Fraux <luthaf@luthaf.fr>
1ca61a6 to
e9217ad
Compare
Empty unit strings are now rejected with a clear error message instead of silently returning 1.0, which could lead to incorrect scientific results from undetected typos or misconfigurations.
Add checks for infinity and NaN results after multiplication, division, and exponentiation operations in the unit expression evaluator. This prevents silent propagation of invalid conversion factors from extreme exponents like 'angstrom^-100'.
GardevoirX
reviewed
Mar 12, 2026
Remove outdated 'Known quantities and units' table that implied users must pick from predefined quantities. The new expression parser accepts any valid unit expression without specifying a quantity. Added documentation covering: - Supported base tokens (length, energy, time, mass, charge, etc.) - Expression syntax (*, /, ^, parentheses, whitespace) - Examples of compound expressions (kJ/mol, eV/A^3, (eV*u)^(1/2)) The parser automatically verifies dimensional compatibility between source and target units.
- Add _known-quantities-units label for cross-references from documentation.py - Fix 'Supported base tokens' title underline (was too short)
GardevoirX
reviewed
Mar 13, 2026
Contributor
GardevoirX
left a comment
There was a problem hiding this comment.
Looks much better, thanks a lot!
Luthaf
reviewed
Mar 13, 2026
Replace user-facing "token" terminology with "unit"/"base unit" across docs, docstrings, C++ header comments, and error messages. Internal parser names (TokenType, tokenize) kept as implementation details. Fix kwargs backward compatibility in register.cpp by using the old 3-arg parameter names (quantity, from_unit, to_unit?) instead of generic _0, _1, _2. Rename RST label from _known-quantities-units to _known-base-units.
9e8c172 to
250b3cd
Compare
Add tests for version_compatible branches (major/minor mismatch), lazy __getattr__ import paths, mass/charge/derived unit conversions, and 3-arg kwargs backward compatibility. Brings project coverage above 75% threshold.
250b3cd to
ef3a1df
Compare
4ff62f8 to
c7a839b
Compare
Luthaf
reviewed
Mar 16, 2026
Luthaf
reviewed
Mar 17, 2026
bc1091d to
738b6a4
Compare
738b6a4 to
10a5e2c
Compare
- check for full exception messages (some tests where passing by detecting a different error than the intended one) - check for float equality with ULPs as much as possible
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #154.
Replaced the per-quantity lookup tables with a Shunting-Yard expression parser
that works on arbitrary compound unit strings in the spirit of lumol.
Each token resolves to an SI conversion factor and a 5-element dimension vector
[L, T, M, Q, Theta]. The parser composes these through multiplication, division,
and exponentiation. Conversion factor between two expressions = ratio of their
SI factors after verifying dimension equality.
API changes
Expression syntax
Operators:
*(multiply),/(divide),^(power),()(grouping).Whitespace ignored. Case-insensitive. Numeric literals allowed in exponents.
Fractional exponents via parenthesized division:
^(1/2).Token table
Single flat
unordered_mapwith 30+ entries covering length (angstrom, bohr, nm,m, cm, mm, um), energy (eV, meV, hartree, ry, joule, kcal, kJ), time (fs, ps),
mass (u, kg, g, electronmass), charge (e, coulomb), dimensionless (mol), and
derived (hbar).
Notes
kelvinis NOT in the token table because temperature conversions betweenoffset-based scales (Celsius, Fahrenheit) are non-multiplicative.
DIM_TEMPERATUREexists as dimension [0,0,0,0,1] for potential future use butno tokens currently carry it. (maybe once we do an API break, can revisit during
mini-metatomic)Contributor (creator of pull-request) checklist
Reviewer checklist