Diakrytynizator

A highly optimized x86_64 assembly program that transforms UTF-8 encoded text by applying polynomial transformations to Unicode characters.

Overview

Diakrytynizator reads UTF-8 text from standard input, applies a polynomial transformation to characters with Unicode values greater than 0x7F, and outputs the transformed text to standard output. ASCII characters (0x00-0x7F) remain unchanged.

How It Works

Polynomial Transformation

The program accepts command-line arguments that define a polynomial:

./diakrytynizator a0 a1 a2 ... an

This defines the polynomial:

w(x) = an * x^n + ... + a2 * x^2 + a1 * x + a0

Transformation rule: For each Unicode character with value x > 0x7F, the program:

Computes w(x - 0x80) mod 0x10FF80
Outputs the character with Unicode value w(x - 0x80) + 0x80

UTF-8 Validation

The program strictly validates UTF-8 encoding:

Accepts Unicode values from 0x00 to 0x10FFFF
Supports 1-4 byte UTF-8 sequences
Rejects overlong encodings (only shortest form accepted)
Returns exit code 1 on invalid input

Building

Compile with NASM and link:

nasm -f elf64 -w+all -w+error -o diakrytynizator.o diakrytynizator.asm
ld --fatal-warnings -o diakrytynizator diakrytynizator.o

Usage Examples

Identity transformation (w(x) = x):

echo "Zażółć gęślą jaźń…" | ./diakrytynizator 0 1
# Output: Zażółć gęślą jaźń…
# Exit code: 0

Constant transformation (w(x) = 133):

echo "Zażółć gęślą jaźń…" | ./diakrytynizator 133
# Output: Zaąąąą gąąlą jaąąą
# Exit code: 0

Complex polynomial (w(x) = x² + 623420x + 1075041):

echo "ŁOŚ" | ./diakrytynizator 1075041 623420 1
# Output: „O"
# Exit code: 0

Error handling (invalid UTF-8):

echo -e "abc\n\x80" | ./diakrytynizator 7
# Output: abc
# Exit code: 1

Technical Details

Implementation Features

Buffered I/O: Uses 1KB buffers for efficient reading and writing
Modular arithmetic: All polynomial computations performed modulo 0x10FF80
UTF-8 parsing: Hand-optimized byte-level UTF-8 decoding and encoding
Parameter validation: Validates polynomial coefficients (non-negative integers, no leading zeros)
Error handling: Comprehensive validation of input encoding and parameters

Architecture

The program consists of several key components:

parse: Converts decimal string arguments to integers with validation
convert_args: Processes command-line arguments into polynomial coefficients
calc_poly: Evaluates the polynomial using Horner's method
utf_count_bytes: Determines UTF-8 character byte length from first byte
utf_bytes_for_code: Calculates required bytes for a Unicode value
load_head_byte/load_tail_byte: UTF-8 decoder
add_char: UTF-8 encoder that writes transformed characters to output buffer

Memory Layout

.data: Constants for UTF-8 masks, prefixes, and modulo value
.bss: Input/output buffers (1KB each)
Stack-based polynomial coefficient storage

Exit Codes

0: Successful execution
1: Error (invalid parameters, malformed UTF-8, or encoding violation)

Acknowledgments

This program was developed as an assignment for the Operating Systems course at the University of Warsaw in 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
diakrytynizator.asm		diakrytynizator.asm
diakrytynizator.h		diakrytynizator.h
parse_test.c		parse_test.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diakrytynizator

Overview

How It Works

Polynomial Transformation

UTF-8 Validation

Building

Usage Examples

Identity transformation (w(x) = x):

Constant transformation (w(x) = 133):

Complex polynomial (w(x) = x² + 623420x + 1075041):

Error handling (invalid UTF-8):

Technical Details

Implementation Features

Architecture

Memory Layout

Exit Codes

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

bartekryba/SO_1

Folders and files

Latest commit

History

Repository files navigation

Diakrytynizator

Overview

How It Works

Polynomial Transformation

UTF-8 Validation

Building

Usage Examples

Identity transformation (w(x) = x):

Constant transformation (w(x) = 133):

Complex polynomial (w(x) = x² + 623420x + 1075041):

Error handling (invalid UTF-8):

Technical Details

Implementation Features

Architecture

Memory Layout

Exit Codes

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages