attali-david/text_sequence_script
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
## DESCRIPTION
The program processes text files or standard input to identify and list the most frequent three-word sequences with support for unicode. The program can output a single list for all inputs. The program can also handle analyzing sequences in multiple files in parallel ouputting separate lists for each or a master list.
## Requirements
- Node.js (version 18 or later)
- npm (Node package manager)
## INSTRUCTIONS
Install packages
- npm i
Command line manual with examples:
- ./index.js -h for a command line manual
Process text from stin:
- cat file1.txt | node index.js
Process text from two files outputting one list:
- node index.js -f file1.txt file2.txt
Process text from two files using two threads outputting multiple lists:
- node index.js -f file1.txt file2.txt -t 2 -m
## DOCKER
docker-compose up -d --build
docker attach relic_interview
## TEST
- npm test
## FURTHER WORK
- Improve error handling for a better user experience
- Explore performance optimizations for larger files
- Add more tests
- Support more file formats
## KNOWN ISSUES
- Specifying a thread count higher than the number of CPU cores leads to warnings but should not affect the program's functionality.
- The program currently does not validate input file paths rigorously.
- The directory should be better organized if this script is extended.