Skip to content

feat: unified NormalizeOptions API + fix #23 compound concat#24

Merged
Alex-Wengg merged 4 commits intomainfrom
feat/unified-options-api-and-issue-23-fix
Apr 27, 2026
Merged

feat: unified NormalizeOptions API + fix #23 compound concat#24
Alex-Wengg merged 4 commits intomainfrom
feat/unified-options-api-and-issue-23-fix

Conversation

@Alex-Wengg
Copy link
Copy Markdown
Member

@Alex-Wengg Alex-Wengg commented Apr 27, 2026

Summary

API changes (additive — no breaking changes)

  • New NormalizeOptions { concat_compound_numbers: bool, max_span_tokens: Option<usize> } with builder helpers (new(), with_concat_compound_numbers, with_max_span_tokens).
  • New unified entry points:
    • normalize_with_options(input, opts)
    • normalize_sentence_with_options(input, opts)
  • Existing normalize_aviation, normalize_sentence_aviation, normalize_sentence_aviation_with_max_span, normalize_sentence_with_max_span stay as thin wrappers around the unified API — no caller migration required.
  • FFI: nemo_normalize_with_options(input, concat_compound_numbers) and nemo_normalize_sentence_with_options(input, concat_compound_numbers, max_span_tokens) (uses u32 for the bool: 0 = false, non-zero = true; max_span_tokens == 0 means "use default").
  • WASM: normalizeWithOptions(input, concatCompoundNumbers) and normalizeSentenceWithOptions(input, concatCompoundNumbers, maxSpanTokens).

Issue #23 fix (src/itn/en/cardinal.rs)

words_to_number_aviation previously only handled digit-prefix + grammatical compound (\"seven eighty eight\"788). It still added consecutive grammatical compounds together.

Replaced the digit-prefix path with a general peel_compound_chunks helper that greedily splits a phrase into 0-99 chunks and concatenates them when there are 2+. Each chunk is one of:

  • A single ONES word (0-19), e.g. \"seven\" → 7, \"sixteen\" → 16
  • A single TENS word (20, 30, ... 90)
  • A TENS + ones-word (1-9), e.g. \"twenty one\" → 21
Input Before After
thirty five sixty two 97 3562
seven eighty eight 788 788 (unchanged)
twenty one 21 21 (unchanged — single chunk)
two thousand seventeen 2017 2017 (unchanged — scale word)
Alright thirty five sixty two appreciate your help United seven eighty eight Alright 97 ... United 788 Alright 3562 ... United 788

Single-chunk inputs still flow through grammatical; any phrase containing a scale word keeps its addition semantics.

One stale assertion was updated: normalize_aviation(\"twenty one forty two\") was locking in the buggy \"63\"; the new behavior is the correct \"2142\".

Test plan

  • cargo fmt
  • cargo build — clean (only pre-existing unrelated warnings)
  • cargo build --features ffi — clean
  • cargo test — 1000+ tests pass, 0 failures
  • cargo test --features ffi — 1000+ tests pass, 0 failures
  • New tests:
    • test_options_default_matches_normalize
    • test_options_concat_matches_aviation
    • test_sentence_options_default_matches_default
    • test_sentence_options_concat_compound
    • test_sentence_options_max_span
    • test_sentence_options_none_max_span_uses_default
    • test_sentence_options_builder_compose
    • test_issue_23_compound_concat (regression test for the original report)

Closes


Open in Devin Review

Address two pieces of feedback from @hongbo-miao:

- Issue #15 comment: instead of separate `normalize_sentence_aviation`
  variants, expose a unified entry point with an options struct.
- Issue #23 comment: prefer a generic flag (not a `Domain` label) since
  other code-style speech contexts want the same "stop adding two
  numbers" behavior.

API
---
- New `NormalizeOptions { concat_compound_numbers, max_span_tokens }`
  with builder helpers.
- New `normalize_with_options` and `normalize_sentence_with_options`
  unified entry points.
- Existing `normalize_aviation`, `normalize_sentence_aviation*`,
  `normalize_sentence_with_max_span` stay as thin wrappers — no
  breaking change for current callers.
- FFI: `nemo_normalize_with_options(input, concat)` and
  `nemo_normalize_sentence_with_options(input, concat, max_span)`.
- WASM: `normalizeWithOptions` / `normalizeSentenceWithOptions`.

Issue #23 fix
-------------
`words_to_number_aviation` previously only handled digit-prefix +
grammatical compound (`"seven eighty eight"` → `"788"`). It still added
consecutive grammatical compounds together, so
`"thirty five sixty two"` resolved to `"97"` (= 35 + 62).

Replaced the digit-prefix path with a general `peel_compound_chunks`
helper that greedily splits a phrase into 0-99 chunks and concatenates
them when there are 2+. Single-chunk inputs (`"twenty one"`) still go
through grammatical, and any phrase with a scale word
(`"two thousand seventeen"`) keeps its addition semantics.

Updated one stale test (`"twenty one forty two"` was locking in the
buggy `63`; it now correctly produces `2142`).
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Per issue #15 and #23 follow-up: remove all backwards-compat wrappers
now that callers pass through the unified `NormalizeOptions` API.

Removed Rust functions:
- `normalize_aviation`, `normalize_sentence_aviation`
- `normalize_sentence_aviation_with_max_span`
- `normalize_sentence_with_max_span`

Removed FFI bindings:
- `nemo_normalize_aviation`, `nemo_normalize_sentence_aviation`
- `nemo_normalize_sentence_with_max_span`
- `nemo_normalize_sentence_aviation_with_max_span`

Removed WASM bindings:
- `normalizeAviation`, `normalizeSentenceAviation`
- `normalizeSentenceWithMaxSpan`, `normalizeSentenceAviationWithMaxSpan`

Callers should switch to:
- Rust: `normalize_with_options` / `normalize_sentence_with_options`
- FFI:  `nemo_normalize_with_options` / `nemo_normalize_sentence_with_options`
- WASM: `normalizeWithOptions` / `normalizeSentenceWithOptions`

Swift wrapper and headers updated accordingly. All Rust tests
(2050 across the workspace incl. doc tests) and FFI tests pass.
Moves `NormalizeOptions`, its builder methods, and `DEFAULT_MAX_SPAN_TOKENS`
out of `src/lib.rs` and into a new `src/options.rs`. The struct is the
extension point for caller-tunable normalization behavior, and giving it
its own module makes room for richer per-field documentation and future
flags without further bloating the crate root.

Each field now carries:
- a "Default" line stating the no-op behavior
- a bulleted list of concrete input → output examples
- the originating issue number for the behavior
- guidance on which use cases want it on/off
- explicit interaction notes (which other taggers still win)

Also documents the `with_*` builder convention as the preferred construction
path so new fields can land without breaking existing call sites that use
struct literals.

`pub use options::{NormalizeOptions, DEFAULT_MAX_SPAN_TOKENS};` keeps the
public path stable — no changes required in FFI, WASM, Swift, or test code.
All Rust + FFI tests pass; WASM check clean.
Compresses the per-field and module-level docs in `src/options.rs` down
to the essentials. The two flags are the user-facing priority — readers
need the one-line behavior, the issue ref, and the default; everything
else (use-case lists, interaction tables, exhaustive examples) belongs
in the README and the integration tests, not in rustdoc bullet walls.
@Alex-Wengg Alex-Wengg merged commit d3f314b into main Apr 27, 2026
5 checks passed
@Alex-Wengg Alex-Wengg mentioned this pull request Apr 27, 2026
4 tasks
Alex-Wengg added a commit that referenced this pull request Apr 27, 2026
- feat: unified NormalizeOptions API + fix #23 compound concat (#24)
- fix: split trailing punctuation in sentence mode (#21) (#25)
- fix: add disable_bare_second flag (#22) + restore TN abbreviation matching (#26)
- fix(ci): pass --features to cargo via -- separator in wasm-pack (#27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

normalize_sentence_aviation failed for thirty five sixty two

1 participant