feat: unified NormalizeOptions API + fix #23 compound concat#24
Merged
Alex-Wengg merged 4 commits intomainfrom Apr 27, 2026
Merged
feat: unified NormalizeOptions API + fix #23 compound concat#24Alex-Wengg merged 4 commits intomainfrom
Alex-Wengg merged 4 commits intomainfrom
Conversation
Address two pieces of feedback from @hongbo-miao: - Issue #15 comment: instead of separate `normalize_sentence_aviation` variants, expose a unified entry point with an options struct. - Issue #23 comment: prefer a generic flag (not a `Domain` label) since other code-style speech contexts want the same "stop adding two numbers" behavior. API --- - New `NormalizeOptions { concat_compound_numbers, max_span_tokens }` with builder helpers. - New `normalize_with_options` and `normalize_sentence_with_options` unified entry points. - Existing `normalize_aviation`, `normalize_sentence_aviation*`, `normalize_sentence_with_max_span` stay as thin wrappers — no breaking change for current callers. - FFI: `nemo_normalize_with_options(input, concat)` and `nemo_normalize_sentence_with_options(input, concat, max_span)`. - WASM: `normalizeWithOptions` / `normalizeSentenceWithOptions`. Issue #23 fix ------------- `words_to_number_aviation` previously only handled digit-prefix + grammatical compound (`"seven eighty eight"` → `"788"`). It still added consecutive grammatical compounds together, so `"thirty five sixty two"` resolved to `"97"` (= 35 + 62). Replaced the digit-prefix path with a general `peel_compound_chunks` helper that greedily splits a phrase into 0-99 chunks and concatenates them when there are 2+. Single-chunk inputs (`"twenty one"`) still go through grammatical, and any phrase with a scale word (`"two thousand seventeen"`) keeps its addition semantics. Updated one stale test (`"twenty one forty two"` was locking in the buggy `63`; it now correctly produces `2142`).
Per issue #15 and #23 follow-up: remove all backwards-compat wrappers now that callers pass through the unified `NormalizeOptions` API. Removed Rust functions: - `normalize_aviation`, `normalize_sentence_aviation` - `normalize_sentence_aviation_with_max_span` - `normalize_sentence_with_max_span` Removed FFI bindings: - `nemo_normalize_aviation`, `nemo_normalize_sentence_aviation` - `nemo_normalize_sentence_with_max_span` - `nemo_normalize_sentence_aviation_with_max_span` Removed WASM bindings: - `normalizeAviation`, `normalizeSentenceAviation` - `normalizeSentenceWithMaxSpan`, `normalizeSentenceAviationWithMaxSpan` Callers should switch to: - Rust: `normalize_with_options` / `normalize_sentence_with_options` - FFI: `nemo_normalize_with_options` / `nemo_normalize_sentence_with_options` - WASM: `normalizeWithOptions` / `normalizeSentenceWithOptions` Swift wrapper and headers updated accordingly. All Rust tests (2050 across the workspace incl. doc tests) and FFI tests pass.
Moves `NormalizeOptions`, its builder methods, and `DEFAULT_MAX_SPAN_TOKENS`
out of `src/lib.rs` and into a new `src/options.rs`. The struct is the
extension point for caller-tunable normalization behavior, and giving it
its own module makes room for richer per-field documentation and future
flags without further bloating the crate root.
Each field now carries:
- a "Default" line stating the no-op behavior
- a bulleted list of concrete input → output examples
- the originating issue number for the behavior
- guidance on which use cases want it on/off
- explicit interaction notes (which other taggers still win)
Also documents the `with_*` builder convention as the preferred construction
path so new fields can land without breaking existing call sites that use
struct literals.
`pub use options::{NormalizeOptions, DEFAULT_MAX_SPAN_TOKENS};` keeps the
public path stable — no changes required in FFI, WASM, Swift, or test code.
All Rust + FFI tests pass; WASM check clean.
Compresses the per-field and module-level docs in `src/options.rs` down to the essentials. The two flags are the user-facing priority — readers need the one-line behavior, the issue ref, and the default; everything else (use-case lists, interaction tables, exhaustive examples) belongs in the README and the integration tests, not in rustdoc bullet walls.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
normalize_sentence_aviationfamily, expose a unified entry point that takes an options struct.thirty five sixty two#23 comment: prefer a generic flag (not aDomainlabel) since other code-style speech contexts (military, dispatch IDs, callsigns) want the same "stop adding two numbers" behavior without being labeled "aviation".thirty five sixty two#23 bug:normalize_sentence_aviation(\"... thirty five sixty two ...\")resolved to97(= 35 + 62) instead of3562. Fixed.API changes (additive — no breaking changes)
NormalizeOptions { concat_compound_numbers: bool, max_span_tokens: Option<usize> }with builder helpers (new(),with_concat_compound_numbers,with_max_span_tokens).normalize_with_options(input, opts)normalize_sentence_with_options(input, opts)normalize_aviation,normalize_sentence_aviation,normalize_sentence_aviation_with_max_span,normalize_sentence_with_max_spanstay as thin wrappers around the unified API — no caller migration required.nemo_normalize_with_options(input, concat_compound_numbers)andnemo_normalize_sentence_with_options(input, concat_compound_numbers, max_span_tokens)(usesu32for the bool: 0 = false, non-zero = true;max_span_tokens == 0means "use default").normalizeWithOptions(input, concatCompoundNumbers)andnormalizeSentenceWithOptions(input, concatCompoundNumbers, maxSpanTokens).Issue #23 fix (
src/itn/en/cardinal.rs)words_to_number_aviationpreviously only handled digit-prefix + grammatical compound (\"seven eighty eight\"→788). It still added consecutive grammatical compounds together.Replaced the digit-prefix path with a general
peel_compound_chunkshelper that greedily splits a phrase into 0-99 chunks and concatenates them when there are 2+. Each chunk is one of:\"seven\"→ 7,\"sixteen\"→ 16\"twenty one\"→ 21thirty five sixty two973562seven eighty eight788788(unchanged)twenty one2121(unchanged — single chunk)two thousand seventeen20172017(unchanged — scale word)Alright thirty five sixty two appreciate your help United seven eighty eightAlright 97 ... United 788Alright 3562 ... United 788Single-chunk inputs still flow through grammatical; any phrase containing a scale word keeps its addition semantics.
One stale assertion was updated:
normalize_aviation(\"twenty one forty two\")was locking in the buggy\"63\"; the new behavior is the correct\"2142\".Test plan
cargo fmtcargo build— clean (only pre-existing unrelated warnings)cargo build --features ffi— cleancargo test— 1000+ tests pass, 0 failurescargo test --features ffi— 1000+ tests pass, 0 failurestest_options_default_matches_normalizetest_options_concat_matches_aviationtest_sentence_options_default_matches_defaulttest_sentence_options_concat_compoundtest_sentence_options_max_spantest_sentence_options_none_max_span_uses_defaulttest_sentence_options_builder_composetest_issue_23_compound_concat(regression test for the original report)Closes
thirty five sixty two#23