Skip to content

bartekryba/SO_1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diakrytynizator

A highly optimized x86_64 assembly program that transforms UTF-8 encoded text by applying polynomial transformations to Unicode characters.

Overview

Diakrytynizator reads UTF-8 text from standard input, applies a polynomial transformation to characters with Unicode values greater than 0x7F, and outputs the transformed text to standard output. ASCII characters (0x00-0x7F) remain unchanged.

How It Works

Polynomial Transformation

The program accepts command-line arguments that define a polynomial:

./diakrytynizator a0 a1 a2 ... an

This defines the polynomial:

w(x) = an * x^n + ... + a2 * x^2 + a1 * x + a0

Transformation rule: For each Unicode character with value x > 0x7F, the program:

  1. Computes w(x - 0x80) mod 0x10FF80
  2. Outputs the character with Unicode value w(x - 0x80) + 0x80

UTF-8 Validation

The program strictly validates UTF-8 encoding:

  • Accepts Unicode values from 0x00 to 0x10FFFF
  • Supports 1-4 byte UTF-8 sequences
  • Rejects overlong encodings (only shortest form accepted)
  • Returns exit code 1 on invalid input

Building

Compile with NASM and link:

nasm -f elf64 -w+all -w+error -o diakrytynizator.o diakrytynizator.asm
ld --fatal-warnings -o diakrytynizator diakrytynizator.o

Usage Examples

Identity transformation (w(x) = x):

echo "Zażółć gęślą jaźń…" | ./diakrytynizator 0 1
# Output: Zażółć gęślą jaźń…
# Exit code: 0

Constant transformation (w(x) = 133):

echo "Zażółć gęślą jaźń…" | ./diakrytynizator 133
# Output: Zaąąąą gąąlą jaąąą
# Exit code: 0

Complex polynomial (w(x) = x² + 623420x + 1075041):

echo "ŁOŚ" | ./diakrytynizator 1075041 623420 1
# Output: „O"
# Exit code: 0

Error handling (invalid UTF-8):

echo -e "abc\n\x80" | ./diakrytynizator 7
# Output: abc
# Exit code: 1

Technical Details

Implementation Features

  • Buffered I/O: Uses 1KB buffers for efficient reading and writing
  • Modular arithmetic: All polynomial computations performed modulo 0x10FF80
  • UTF-8 parsing: Hand-optimized byte-level UTF-8 decoding and encoding
  • Parameter validation: Validates polynomial coefficients (non-negative integers, no leading zeros)
  • Error handling: Comprehensive validation of input encoding and parameters

Architecture

The program consists of several key components:

  • parse: Converts decimal string arguments to integers with validation
  • convert_args: Processes command-line arguments into polynomial coefficients
  • calc_poly: Evaluates the polynomial using Horner's method
  • utf_count_bytes: Determines UTF-8 character byte length from first byte
  • utf_bytes_for_code: Calculates required bytes for a Unicode value
  • load_head_byte/load_tail_byte: UTF-8 decoder
  • add_char: UTF-8 encoder that writes transformed characters to output buffer

Memory Layout

  • .data: Constants for UTF-8 masks, prefixes, and modulo value
  • .bss: Input/output buffers (1KB each)
  • Stack-based polynomial coefficient storage

Exit Codes

  • 0: Successful execution
  • 1: Error (invalid parameters, malformed UTF-8, or encoding violation)

Acknowledgments

This program was developed as an assignment for the Operating Systems course at the University of Warsaw in 2021.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published