📄 Doc_Analyzer (PDF Text & NLP Analyzer in Python)

DocAnalyzer is a command-line tool built with Python that:

Extracts full text from PDF files
Identifies and highlights named entities (like names, dates, places)
Analyzes and visualizes word frequency
Generates a word cloud
Creates important sentences summary
Saves output in organized files and images

This is a self-directed NLP + automation project designed for offline usage, perfect for researchers and students.

Built With

pdfplumber

nltk

spaCy

matplotlib

wordcloud

Demo

Word Cloud Example	Word Frequency Plot

Sample Input

We used a publicly available sample AI-related PDF (sample.pdf) which contains natural language and technical content. You can replace it with any other document.

Output Files

All results are saved in the output/ folder:

File	Description
`full_text.txt`	Complete text extracted from the PDF
`summary.txt`	Top most relevant sentences
`wordcloud.png`	Word cloud of frequent words
`frequency_plot.png`	Bar chart of top word frequencies

Make sure sample.pdf is placed in the project root directory, like this:

Files Arrangements

DocAnalyzer/
├── pdf_analyzer.py
├── sample.pdf 
├── requirements.txt
├── README.md
├── output/
│ ├── full_text.txt
│ ├── summary.txt
│ ├── frequency_plot.png
│ └── wordcloud.png
└── .gitignore

Setup (First Time)

Clone the repository

git clone https://github.com/Waleed99i/DocAnalyzer.git
cd DocAnalyzer

Create virtual environment

python3 -m venv venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Download NLTK + spaCy models

python -m nltk.downloader punkt stopwords
python -m spacy download en_core_web_sm

How to Run

Step 1: Activate virtual environment

source venv/bin/activate

Step 2: Run the analyzer

python pdf_analyzer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Doc_Analyzer (PDF Text & NLP Analyzer in Python)

Built With

Demo

Sample Input

Output Files

Files Arrangements

Setup (First Time)

Clone the repository

Create virtual environment

Install dependencies

Download NLTK + spaCy models

How to Run

Step 1: Activate virtual environment

Step 2: Run the analyzer

Author

Muhammad Waleed Akram

Electrical Engineering Student | AI + Systems Enthusiast

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
output		output
venv		venv
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdf_analyzer.py		pdf_analyzer.py
requirements.txt		requirements.txt
sample.pdf		sample.pdf

Folders and files

Latest commit

History

Repository files navigation

📄 Doc_Analyzer (PDF Text & NLP Analyzer in Python)

Built With

Demo

Sample Input

Output Files

Files Arrangements

Setup (First Time)

Clone the repository

Create virtual environment

Install dependencies

Download NLTK + spaCy models

How to Run

Step 1: Activate virtual environment

Step 2: Run the analyzer

Author

Muhammad Waleed Akram

Electrical Engineering Student | AI + Systems Enthusiast

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages