Skip to content

Releases: FluidInference/text-processing-rs

v0.2.2

27 Apr 05:24
34a9b62

Choose a tag to compare

What's Changed

Features

  • Unified NormalizeOptions API + fix #23 compound concat (#24)

Fixes

  • Split trailing punctuation in sentence mode (#21) (#25)
  • Add disable_bare_second flag (#22) + restore TN abbreviation matching (#26)
  • CI: pass --features to cargo via -- separator in wasm-pack (#27)

Full Changelog: v0.2.1...v0.2.2

v0.2.1 — aviation pipeline + spelled-decimal fix

26 Apr 19:03

Choose a tag to compare

Highlights

#15 — Spelled decimals

one three five point six two five now correctly normalizes to 135.625 instead of being misread as a sum.

#14 — Opt-in aviation flight-number reading

New public API for ASR contexts where number phrases like "United seven eighty eight" should read as flight numbers (United 788) rather than as ordinary cardinals or times. Default normalize / normalize_sentence are unchanged — aviation is opt-in to preserve existing date / time semantics (e.g. "twenty one forty two" still reads as old-year 2042).

New public API

Rust

Function Behaviour
cardinal::words_to_number_aviation(&str) -> Option<i128> Lowest-level reader. Recognises digit-prefix + grammatical-compound ("seven eighty eight"788). Falls back to grammatical when the pattern doesn't apply.
cardinal::parse_aviation(&str) -> Option<String> String-typed wrapper around the above.
normalize_aviation(&str) -> String Single-input dispatch with parse_aviation prioritized; falls through to standard normalize for non-aviation phrases.
normalize_sentence_aviation(&str) -> String Sentence scanner. Aviation cardinal runs at priority 89 — above date=88 / time=85, below money=95 / measure=90. Money / measure / decimal phrases keep their existing semantics.
normalize_sentence_aviation_with_max_span(&str, usize) -> String Configurable max span (default 16 tokens).

C FFI (--features ffi)

  • nemo_normalize_aviation
  • nemo_normalize_sentence_aviation
  • nemo_normalize_sentence_aviation_with_max_span

wasm (@fluidinference/text-processing-rs)

  • normalizeAviation
  • normalizeSentenceAviation
  • normalizeSentenceAviationWithMaxSpan

Bug fixes (Devin review)

  • Bare "oh" / "o" no longer normalize to 0. Single-token interjections / letters used to silently resolve to digit 0 because the digit-by-digit branch fired on length-1 input. Now requires words.len() >= 2. Bare "zero" still resolves via the grammatical fallback. Multi-token spelled forms ("oh oh seven"7, "five oh five"505) are unchanged.
  • AGENTS.md compliance. Aviation API is exposed across lib.rs, ffi.rs, and wasm.rs together.
  • Aviation cardinal token gate lifted. parse_span no longer requires token_count <= 4 for aviation cardinal — opt-in mode accepts aggressive matching across longer spans like "one thousand two hundred thirty four".

Behaviour matrix

Input normalize normalize_aviation normalize_sentence normalize_sentence_aviation
seven eighty eight 95 788 seven eighty eight 788
two thirty five 2:35 (time) 235 2:35 235
United seven eighty eight n/a n/a United seven eighty eight United 788
twenty one forty two 63 63 2042 (old-year) 2042 (preserved)
five dollars $5 $5 $5 $5 (money still wins)
oh (was 0) → unchanged token unchanged token unchanged unchanged

Distribution

  • crates.io: text-processing-rs = "0.2.1"
  • npm: @fluidinference/text-processing-rs@0.2.1

Issues / PRs

  • Closes #15
  • Addresses #14 via opt-in pipeline
  • PR #20