feat: unified NormalizeOptions API + fix #23 compound concat by Alex-Wengg · Pull Request #24 · FluidInference/text-processing-rs

Alex-Wengg · 2026-04-27T03:32:29Z

Summary

Issue Failed on case "frequency one three five point six two five" #15 comment: instead of a separate normalize_sentence_aviation family, expose a unified entry point that takes an options struct.
Issue normalize_sentence_aviation failed for thirty five sixty two #23 comment: prefer a generic flag (not a Domain label) since other code-style speech contexts (military, dispatch IDs, callsigns) want the same "stop adding two numbers" behavior without being labeled "aviation".
Issue normalize_sentence_aviation failed for thirty five sixty two #23 bug: normalize_sentence_aviation(\"... thirty five sixty two ...\") resolved to 97 (= 35 + 62) instead of 3562. Fixed.

API changes (additive — no breaking changes)

New NormalizeOptions { concat_compound_numbers: bool, max_span_tokens: Option<usize> } with builder helpers (new(), with_concat_compound_numbers, with_max_span_tokens).
New unified entry points:
- normalize_with_options(input, opts)
- normalize_sentence_with_options(input, opts)
Existing normalize_aviation, normalize_sentence_aviation, normalize_sentence_aviation_with_max_span, normalize_sentence_with_max_span stay as thin wrappers around the unified API — no caller migration required.
FFI: nemo_normalize_with_options(input, concat_compound_numbers) and nemo_normalize_sentence_with_options(input, concat_compound_numbers, max_span_tokens) (uses u32 for the bool: 0 = false, non-zero = true; max_span_tokens == 0 means "use default").
WASM: normalizeWithOptions(input, concatCompoundNumbers) and normalizeSentenceWithOptions(input, concatCompoundNumbers, maxSpanTokens).

Issue #23 fix (`src/itn/en/cardinal.rs`)

words_to_number_aviation previously only handled digit-prefix + grammatical compound (\"seven eighty eight\" → 788). It still added consecutive grammatical compounds together.

Replaced the digit-prefix path with a general peel_compound_chunks helper that greedily splits a phrase into 0-99 chunks and concatenates them when there are 2+. Each chunk is one of:

A single ONES word (0-19), e.g. \"seven\" → 7, \"sixteen\" → 16
A single TENS word (20, 30, ... 90)
A TENS + ones-word (1-9), e.g. \"twenty one\" → 21

Input	Before	After
`thirty five sixty two`	`97`	`3562`
`seven eighty eight`	`788`	`788` (unchanged)
`twenty one`	`21`	`21` (unchanged — single chunk)
`two thousand seventeen`	`2017`	`2017` (unchanged — scale word)
`Alright thirty five sixty two appreciate your help United seven eighty eight`	`Alright 97 ... United 788`	`Alright 3562 ... United 788`

Single-chunk inputs still flow through grammatical; any phrase containing a scale word keeps its addition semantics.

One stale assertion was updated: normalize_aviation(\"twenty one forty two\") was locking in the buggy \"63\"; the new behavior is the correct \"2142\".

Test plan

cargo fmt
cargo build — clean (only pre-existing unrelated warnings)
cargo build --features ffi — clean
cargo test — 1000+ tests pass, 0 failures
cargo test --features ffi — 1000+ tests pass, 0 failures
New tests:
- test_options_default_matches_normalize
- test_options_concat_matches_aviation
- test_sentence_options_default_matches_default
- test_sentence_options_concat_compound
- test_sentence_options_max_span
- test_sentence_options_none_max_span_uses_default
- test_sentence_options_builder_compose
- test_issue_23_compound_concat (regression test for the original report)

Closes

Closes normalize_sentence_aviation failed for thirty five sixty two #23
Follow-up to feedback on Failed on case "frequency one three five point six two five" #15

@hongbo-miao

Address two pieces of feedback from @hongbo-miao: - Issue #15 comment: instead of separate `normalize_sentence_aviation` variants, expose a unified entry point with an options struct. - Issue #23 comment: prefer a generic flag (not a `Domain` label) since other code-style speech contexts want the same "stop adding two numbers" behavior. API --- - New `NormalizeOptions { concat_compound_numbers, max_span_tokens }` with builder helpers. - New `normalize_with_options` and `normalize_sentence_with_options` unified entry points. - Existing `normalize_aviation`, `normalize_sentence_aviation*`, `normalize_sentence_with_max_span` stay as thin wrappers — no breaking change for current callers. - FFI: `nemo_normalize_with_options(input, concat)` and `nemo_normalize_sentence_with_options(input, concat, max_span)`. - WASM: `normalizeWithOptions` / `normalizeSentenceWithOptions`. Issue #23 fix ------------- `words_to_number_aviation` previously only handled digit-prefix + grammatical compound (`"seven eighty eight"` → `"788"`). It still added consecutive grammatical compounds together, so `"thirty five sixty two"` resolved to `"97"` (= 35 + 62). Replaced the digit-prefix path with a general `peel_compound_chunks` helper that greedily splits a phrase into 0-99 chunks and concatenates them when there are 2+. Single-chunk inputs (`"twenty one"`) still go through grammatical, and any phrase with a scale word (`"two thousand seventeen"`) keeps its addition semantics. Updated one stale test (`"twenty one forty two"` was locking in the buggy `63`; it now correctly produces `2142`).

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Per issue #15 and #23 follow-up: remove all backwards-compat wrappers now that callers pass through the unified `NormalizeOptions` API. Removed Rust functions: - `normalize_aviation`, `normalize_sentence_aviation` - `normalize_sentence_aviation_with_max_span` - `normalize_sentence_with_max_span` Removed FFI bindings: - `nemo_normalize_aviation`, `nemo_normalize_sentence_aviation` - `nemo_normalize_sentence_with_max_span` - `nemo_normalize_sentence_aviation_with_max_span` Removed WASM bindings: - `normalizeAviation`, `normalizeSentenceAviation` - `normalizeSentenceWithMaxSpan`, `normalizeSentenceAviationWithMaxSpan` Callers should switch to: - Rust: `normalize_with_options` / `normalize_sentence_with_options` - FFI: `nemo_normalize_with_options` / `nemo_normalize_sentence_with_options` - WASM: `normalizeWithOptions` / `normalizeSentenceWithOptions` Swift wrapper and headers updated accordingly. All Rust tests (2050 across the workspace incl. doc tests) and FFI tests pass.

Moves `NormalizeOptions`, its builder methods, and `DEFAULT_MAX_SPAN_TOKENS` out of `src/lib.rs` and into a new `src/options.rs`. The struct is the extension point for caller-tunable normalization behavior, and giving it its own module makes room for richer per-field documentation and future flags without further bloating the crate root. Each field now carries: - a "Default" line stating the no-op behavior - a bulleted list of concrete input → output examples - the originating issue number for the behavior - guidance on which use cases want it on/off - explicit interaction notes (which other taggers still win) Also documents the `with_*` builder convention as the preferred construction path so new fields can land without breaking existing call sites that use struct literals. `pub use options::{NormalizeOptions, DEFAULT_MAX_SPAN_TOKENS};` keeps the public path stable — no changes required in FFI, WASM, Swift, or test code. All Rust + FFI tests pass; WASM check clean.

Compresses the per-field and module-level docs in `src/options.rs` down to the essentials. The two flags are the user-facing priority — readers need the one-line behavior, the issue ref, and the default; everything else (use-case lists, interaction tables, exhaustive examples) belongs in the README and the integration tests, not in rustdoc bullet walls.

- feat: unified NormalizeOptions API + fix #23 compound concat (#24) - fix: split trailing punctuation in sentence mode (#21) (#25) - fix: add disable_bare_second flag (#22) + restore TN abbreviation matching (#26) - fix(ci): pass --features to cargo via -- separator in wasm-pack (#27)

devin-ai-integration Bot reviewed Apr 27, 2026

View reviewed changes

Alex-Wengg added 3 commits April 26, 2026 23:49

Alex-Wengg merged commit d3f314b into main Apr 27, 2026
5 checks passed

Alex-Wengg mentioned this pull request Apr 27, 2026

chore(release): v0.2.2 #28

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unified NormalizeOptions API + fix #23 compound concat#24

feat: unified NormalizeOptions API + fix #23 compound concat#24
Alex-Wengg merged 4 commits intomainfrom
feat/unified-options-api-and-issue-23-fix

Alex-Wengg commented Apr 27, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Alex-Wengg commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API changes (additive — no breaking changes)

Issue #23 fix (src/itn/en/cardinal.rs)

Test plan

Closes

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alex-Wengg commented Apr 27, 2026 •

edited

Loading

Issue #23 fix (`src/itn/en/cardinal.rs`)