GitHub - attali-david/text_sequence_script

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README		README
docker-compose.yml		docker-compose.yml
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
process_analyze_files.js		process_analyze_files.js

Repository files navigation

## DESCRIPTION
The program processes text files or standard input to identify and list the most frequent three-word sequences with support for unicode. The program can output a single list for all inputs. The program can also handle analyzing sequences in multiple files in parallel ouputting separate lists for each or a master list.


## Requirements
- Node.js (version 18 or later)
- npm (Node package manager)

## INSTRUCTIONS
Install packages
    - npm i
Command line manual with examples:
    - ./index.js -h for a command line manual
Process text from stin:
    - cat file1.txt | node index.js
Process text from two files outputting one list:
    - node index.js -f file1.txt file2.txt
Process text from two files using two threads outputting multiple lists:
    - node index.js -f file1.txt file2.txt -t 2 -m

## DOCKER
docker-compose up -d --build
docker attach relic_interview

## TEST
- npm test

## FURTHER WORK
- Improve error handling for a better user experience
- Explore performance optimizations for larger files
- Add more tests
- Support more file formats

## KNOWN ISSUES
- Specifying a thread count higher than the number of CPU cores leads to warnings but should not affect the program's functionality.
- The program currently does not validate input file paths rigorously.
- The directory should be better organized if this script is extended.