Skip to content

attali-david/text_sequence_script

Repository files navigation

## DESCRIPTION
The program processes text files or standard input to identify and list the most frequent three-word sequences with support for unicode. The program can output a single list for all inputs. The program can also handle analyzing sequences in multiple files in parallel ouputting separate lists for each or a master list.


## Requirements
- Node.js (version 18 or later)
- npm (Node package manager)

## INSTRUCTIONS
Install packages
    - npm i
Command line manual with examples:
    - ./index.js -h for a command line manual
Process text from stin:
    - cat file1.txt | node index.js
Process text from two files outputting one list:
    - node index.js -f file1.txt file2.txt
Process text from two files using two threads outputting multiple lists:
    - node index.js -f file1.txt file2.txt -t 2 -m

## DOCKER
docker-compose up -d --build
docker attach relic_interview

## TEST
- npm test

## FURTHER WORK
- Improve error handling for a better user experience
- Explore performance optimizations for larger files
- Add more tests
- Support more file formats

## KNOWN ISSUES
- Specifying a thread count higher than the number of CPU cores leads to warnings but should not affect the program's functionality.
- The program currently does not validate input file paths rigorously.
- The directory should be better organized if this script is extended.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors