Releases: FluidInference/text-processing-rs
Releases · FluidInference/text-processing-rs
v0.2.2
v0.2.1 — aviation pipeline + spelled-decimal fix
Highlights
#15 — Spelled decimals
one three five point six two five now correctly normalizes to 135.625 instead of being misread as a sum.
#14 — Opt-in aviation flight-number reading
New public API for ASR contexts where number phrases like "United seven eighty eight" should read as flight numbers (United 788) rather than as ordinary cardinals or times. Default normalize / normalize_sentence are unchanged — aviation is opt-in to preserve existing date / time semantics (e.g. "twenty one forty two" still reads as old-year 2042).
New public API
Rust
| Function | Behaviour |
|---|---|
cardinal::words_to_number_aviation(&str) -> Option<i128> |
Lowest-level reader. Recognises digit-prefix + grammatical-compound ("seven eighty eight" → 788). Falls back to grammatical when the pattern doesn't apply. |
cardinal::parse_aviation(&str) -> Option<String> |
String-typed wrapper around the above. |
normalize_aviation(&str) -> String |
Single-input dispatch with parse_aviation prioritized; falls through to standard normalize for non-aviation phrases. |
normalize_sentence_aviation(&str) -> String |
Sentence scanner. Aviation cardinal runs at priority 89 — above date=88 / time=85, below money=95 / measure=90. Money / measure / decimal phrases keep their existing semantics. |
normalize_sentence_aviation_with_max_span(&str, usize) -> String |
Configurable max span (default 16 tokens). |
C FFI (--features ffi)
nemo_normalize_aviationnemo_normalize_sentence_aviationnemo_normalize_sentence_aviation_with_max_span
wasm (@fluidinference/text-processing-rs)
normalizeAviationnormalizeSentenceAviationnormalizeSentenceAviationWithMaxSpan
Bug fixes (Devin review)
- Bare "oh" / "o" no longer normalize to
0. Single-token interjections / letters used to silently resolve to digit 0 because the digit-by-digit branch fired on length-1 input. Now requireswords.len() >= 2. Bare "zero" still resolves via the grammatical fallback. Multi-token spelled forms ("oh oh seven" →7, "five oh five" →505) are unchanged. - AGENTS.md compliance. Aviation API is exposed across
lib.rs,ffi.rs, andwasm.rstogether. - Aviation cardinal token gate lifted.
parse_spanno longer requirestoken_count <= 4for aviation cardinal — opt-in mode accepts aggressive matching across longer spans like "one thousand two hundred thirty four".
Behaviour matrix
| Input | normalize |
normalize_aviation |
normalize_sentence |
normalize_sentence_aviation |
|---|---|---|---|---|
seven eighty eight |
95 |
788 |
seven eighty eight |
788 |
two thirty five |
2:35 (time) |
235 |
2:35 |
235 |
United seven eighty eight |
n/a | n/a | United seven eighty eight |
United 788 |
twenty one forty two |
63 |
63 |
2042 (old-year) |
2042 (preserved) |
five dollars |
$5 |
$5 |
$5 |
$5 (money still wins) |
oh |
(was 0) → unchanged token |
unchanged token | unchanged | unchanged |
Distribution
- crates.io:
text-processing-rs = "0.2.1" - npm:
@fluidinference/text-processing-rs@0.2.1