Skip to content

feat(units): add an expression parser#173

Merged
Luthaf merged 44 commits intometatensor:mainfrom
HaoZeke:feat/unit-expression-parser
Mar 18, 2026
Merged

feat(units): add an expression parser#173
Luthaf merged 44 commits intometatensor:mainfrom
HaoZeke:feat/unit-expression-parser

Conversation

@HaoZeke
Copy link
Copy Markdown
Member

@HaoZeke HaoZeke commented Mar 4, 2026

Closes #154.

Replaced the per-quantity lookup tables with a Shunting-Yard expression parser
that works on arbitrary compound unit strings in the spirit of lumol.

"kJ/mol/A^2"  -->  tokenize  -->  shunting-yard  -->  AST  -->  eval
                   [kJ,/,mol,     [kJ,mol,/,         tree     {factor, dim}
                    /,A,^,2]       A,2,^,/]

Each token resolves to an SI conversion factor and a 5-element dimension vector
[L, T, M, Q, Theta]. The parser composes these through multiplication, division,
and exponentiation. Conversion factor between two expressions = ratio of their
SI factors after verifying dimension equality.

API changes

| Before (3-arg)                                  | After (2-arg)                                    |
|-------------------------------------------------+--------------------------------------------------|
| ~unit_conversion_factor("energy", "eV", "meV")~   | ~unit_conversion_factor("eV", "meV")~              |
| ~unit_conversion_factor("force", "eV/A", "eV/A")~ | ~unit_conversion_factor("eV/A", "eV/A")~           |
| Not possible                                    | ~unit_conversion_factor("(eV*u)^(1/2)", "u*A/fs")~ |

Expression syntax

Operators: * (multiply), / (divide), ^ (power), () (grouping).
Whitespace ignored. Case-insensitive. Numeric literals allowed in exponents.
Fractional exponents via parenthesized division: ^(1/2).

Token table

Single flat unordered_map with 30+ entries covering length (angstrom, bohr, nm,
m, cm, mm, um), energy (eV, meV, hartree, ry, joule, kcal, kJ), time (fs, ps),
mass (u, kg, g, electronmass), charge (e, coulomb), dimensionless (mol), and
derived (hbar).

Notes

kelvin is NOT in the token table because temperature conversions between
offset-based scales (Celsius, Fahrenheit) are non-multiplicative.
DIM_TEMPERATURE exists as dimension [0,0,0,0,1] for potential future use but
no tokens currently carry it. (maybe once we do an API break, can revisit during mini-metatomic)

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

Reviewer checklist

  • CHANGELOG updated with public API or any other important changes?

@HaoZeke HaoZeke requested review from GardevoirX and Luthaf March 4, 2026 11:10
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch 3 times, most recently from 92bf24c to 0fe0ef7 Compare March 4, 2026 12:05
@GardevoirX
Copy link
Copy Markdown
Contributor

Thanks a lot! I think it would be better if we can use the new functionality to check if the quantity and unit match, when initializing ModelOutput here
https://github.com/HaoZeke/metatomic/blob/0fe0ef73e82b089811278472acba56267578b80c/metatomic-torch/include/metatomic/torch/model.hpp#L48-L61

Comment thread metatomic-torch/src/model.cpp Outdated
Comment thread metatomic-torch/src/model.cpp Outdated
Comment thread python/metatomic_torch/tests/units.py
HaoZeke added a commit to HaoZeke/metatomic that referenced this pull request Mar 4, 2026
Address PR metatensor#173 review feedback from GardevoirX:
- Add s, second, ms, us, ns, ps with full-word aliases to time tokens
- Add tests verifying ModelOutput rejects mismatched quantity/unit dims
- Add tests for standalone micro sign (U+00B5) -> Dalton resolution
- Update docs token table and doxygen with new time unit coverage
- Fix stray dash in RST list-table Dimensionless row
@HaoZeke HaoZeke requested a review from GardevoirX March 4, 2026 15:26
Copy link
Copy Markdown
Contributor

@GardevoirX GardevoirX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, love it!

Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread metatomic-torch/tests/models.cpp Outdated
Comment thread metatomic-torch/CHANGELOG.md Outdated
Comment thread python/metatomic_torch/metatomic/torch/__init__.py Outdated
Comment thread python/metatomic_torch/metatomic/torch/documentation.py Outdated
Comment thread python/metatomic_torch/tests/units.py Outdated
HaoZeke added 3 commits March 9, 2026 12:47
Replace the old string-matching unit conversion with a Shunting-Yard
expression parser that supports *, /, ^, and parentheses. Unit code
extracted to units.hpp/cpp per review feedback. Micro sign handling
uses explicit base_units entries instead of global normalization.
Remove standalone micro sign test (no longer normalizes globally),
add micro sign microsecond tests instead. Update error message
assertions ("unknown unit" not "unknown unit token"). Add comment
explaining why models.cpp test uses valid quantity/unit strings.
Move full unit expression documentation to the Doxygen comment on
unit_conversion_factor in units.hpp (renders via autofunction).
Remove redundant standalone sections from misc.rst, keeping only
the known-quantities table and deprecation note.
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 01198d7 to eb8e28a Compare March 9, 2026 11:48
HaoZeke added 4 commits March 9, 2026 12:50
Remove validate_unit from CHANGELOG (not public API). Move
unit_conversion_factor docstring from documentation.py to the
Python function in __init__.py (documentation.py is only for
C++ ops).
to_lower() now skips non-ASCII bytes, preventing macOS locale from
mangling UTF-8 micro sign (0xB5 -> 'u'). Restore unit_conversion_factor
in documentation.py since __init__.py imports it for Sphinx builds.
Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread metatomic-torch/include/metatomic/torch/units.hpp
Comment thread metatomic-torch/include/metatomic/torch/model.hpp Outdated
Comment thread metatomic-torch/src/units.cpp Outdated
Comment thread metatomic-torch/src/units.cpp Outdated
Comment thread metatomic-torch/src/units.cpp Outdated


def unit_conversion_factor(quantity: str, from_unit: str, to_unit: str):
def unit_conversion_factor(from_unit: str, to_unit: str) -> float:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be in this file if we re-define a Python function anyway

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed for the Sphinx/Type checking thing, probably could be handled better by refactoring the sphinx imports but maybe not here?

HaoZeke added 4 commits March 11, 2026 10:55
- Add cross-ref link in misc.rst docs to C++ API reference
- Add [[deprecated]] attribute to 3-arg unit_conversion_factor overload
- Remove units.hpp include from model.hpp, add direct includes in .cpp
- Clean up AST evaluator comment (remove lumol reference)
- Simplify test comment in models.cpp
- Suppress deprecated warnings in register.cpp and unit tests
Use C++ [[deprecated]] attribute as requested by review. Pragma guards
are isolated in wrapper functions (register.cpp, tests/units.cpp) to
keep call sites clean.
Expose the 2-arg C++ op directly as metatomic.torch.unit_conversion_factor
instead of a *args/**kwargs Python wrapper. This allows calling it from
TorchScript models (e.g. inside AtomisticModel).

The deprecated 3-arg form remains available via the C++ op
torch.ops.metatomic.unit_conversion_factor for backward compat.

Add TorchScript compatibility test. Update Python tests to call the
3-arg C++ op directly. Drop unnecessary comment in models.cpp.
…model.py

- Add units.hpp to torch.hpp so C++ tests find the symbol
- Replace torch.ops.metatomic.unit_conversion_factor_v2 calls in model.py
  with the module-level unit_conversion_factor (per review request)
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 5fd5187 to 6588fa1 Compare March 12, 2026 01:00
Wrap long line in test_conversion_length_3arg to satisfy ruff.

style: remove unused warnings import in units test

Fixes ruff F401 (unused import) and I001 (unsorted imports).

style: fix ruff import sorting in units test

Ruff expects a blank line after imports.

Update metatomic-torch/src/units.cpp

Co-authored-by: Guillaume Fraux <luthaf@luthaf.fr>

Update metatomic-torch/src/units.cpp

Co-authored-by: Guillaume Fraux <luthaf@luthaf.fr>
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 1ca61a6 to e9217ad Compare March 12, 2026 13:15
@HaoZeke HaoZeke requested a review from Luthaf March 12, 2026 13:16
HaoZeke added 2 commits March 12, 2026 14:27
Empty unit strings are now rejected with a clear error message instead of
silently returning 1.0, which could lead to incorrect scientific results
from undetected typos or misconfigurations.
Add checks for infinity and NaN results after multiplication, division,
and exponentiation operations in the unit expression evaluator. This
prevents silent propagation of invalid conversion factors from extreme
exponents like 'angstrom^-100'.
Comment thread docs/src/torch/reference/misc.rst Outdated
Remove outdated 'Known quantities and units' table that implied
users must pick from predefined quantities. The new expression
parser accepts any valid unit expression without specifying a
quantity.

Added documentation covering:
- Supported base tokens (length, energy, time, mass, charge, etc.)
- Expression syntax (*, /, ^, parentheses, whitespace)
- Examples of compound expressions (kJ/mol, eV/A^3, (eV*u)^(1/2))

The parser automatically verifies dimensional compatibility between
source and target units.
@HaoZeke HaoZeke requested a review from GardevoirX March 13, 2026 02:48
- Add _known-quantities-units label for cross-references from documentation.py
- Fix 'Supported base tokens' title underline (was too short)
Copy link
Copy Markdown
Contributor

@GardevoirX GardevoirX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, thanks a lot!

Comment thread docs/src/torch/reference/misc.rst Outdated
Comment thread docs/src/torch/reference/misc.rst
Comment thread metatomic-torch/src/register.cpp Outdated
Comment thread metatomic-torch/tests/models.cpp
HaoZeke added 2 commits March 15, 2026 01:40
Replace user-facing "token" terminology with "unit"/"base unit" across
docs, docstrings, C++ header comments, and error messages. Internal
parser names (TokenType, tokenize) kept as implementation details.

Fix kwargs backward compatibility in register.cpp by using the old
3-arg parameter names (quantity, from_unit, to_unit?) instead of
generic _0, _1, _2.

Rename RST label from _known-quantities-units to _known-base-units.
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 9e8c172 to 250b3cd Compare March 16, 2026 15:12
Add tests for version_compatible branches (major/minor mismatch),
lazy __getattr__ import paths, mass/charge/derived unit conversions,
and 3-arg kwargs backward compatibility. Brings project coverage
above 75% threshold.
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 250b3cd to ef3a1df Compare March 16, 2026 15:16
@HaoZeke HaoZeke requested a review from Luthaf March 16, 2026 15:18
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 4ff62f8 to c7a839b Compare March 16, 2026 16:14
Comment thread metatomic-torch/src/register.cpp Outdated
Comment thread metatomic-torch/src/register.cpp Outdated
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch 4 times, most recently from bc1091d to 738b6a4 Compare March 17, 2026 14:23
@HaoZeke HaoZeke force-pushed the feat/unit-expression-parser branch from 738b6a4 to 10a5e2c Compare March 17, 2026 14:37
@HaoZeke HaoZeke requested a review from Luthaf March 17, 2026 15:17
HaoZeke and others added 3 commits March 17, 2026 19:52
- check for full exception messages (some tests where passing
  by detecting a different error than the intended one)
- check for float equality with ULPs as much as possible
@Luthaf Luthaf merged commit a59f4cc into metatensor:main Mar 18, 2026
49 checks passed
@HaoZeke HaoZeke deleted the feat/unit-expression-parser branch March 18, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use a proper expression parser for unit conversions

3 participants