Several C programs to process text documents. The programs will count the frequency of every distinct word in a text document and print out the top 3 most frequent words for each provided text document. User can input more than one file at a time to be processed.
rvw.cdoes word counting sequentially in the main process, printing results one after the other for each data set.process.cdoes word counting concurrently using multiple processes, creating a new child process per data set.thread.cdoes word counting concurrently using multiple threads, creating one new pthread per data set.
Compile with:
$ gcc rvw.c$ gcc process.c$ gcc thread.c -lpthreadRun with:
$ ./a.out 'filename1' 'filename2' ...dataset1.txt dataset2.txt dataset3.txt are given for your reference. To check accuracy, you may use the following Unix commands:
To show how many distinct words in file ‘dataset1.txt’:
$ cat dataset1.txt | tr -s ' \t' '\n\n' | tr -d '\t' | grep -v '^$'| sort | uniq -c | wc -lTo show the top 3 most frequent words in ‘dataset1.txt’ and their frequencies:
$ cat dataset1.txt | tr -s ' \t' '\n\n' | sort | uniq -c | sort -r -n -k1 | head -n 3MIT © Ursula Sarracini