This project applies Natural Language Processing (NLP) techniques to sustainability reports published by the Ministry of Environment, Forest and Climate Change (MOEFCC), India.
The objective is to extract key environmental themes, analyze how they evolve over time, and visualize insights using data analytics techniques.
- Extract meaningful themes from annual sustainability reports
- Analyze how environmental focus has evolved over the years
- Identify key policy trends and topic distributions
- Generate visual insights using charts and dashboards
charts/ # Generated visualizations
reports_pdf/ # Source PDFs (excluded from GitHub)
moefcc_nlp_pipeline.py # Main NLP pipeline
themes_over_years.py # Trend analysis script
moefcc_demo.html # Interactive dashboard
moefcc_sustainability_dataset.csv # Processed dataset
requirements.txt # Dependencies
README.md
- Python
- Natural Language Processing (NLP)
- Pandas
- Matplotlib / Seaborn
- PDF Text Extraction
- 📄 Extracts text from MOEFCC sustainability reports
- 🧠 Identifies key environmental themes using NLP
- 📈 Analyzes trends across multiple years
- 📊 Generates visualizations:
- Line charts
- Stacked charts
- Heatmaps
- Dashboard view
pip install -r requirements.txt
python moefcc_nlp_pipeline.py
python themes_over_years.py
Open the file in your browser: moefcc_demo.html
Due to GitHub file size limitations, large PDF reports are excluded using .gitignore.
All processed data and visual outputs are included for reproducibility.
- Environmental policy analysis
- Academic NLP research
- Sustainability trend tracking
- Real-world NLP project demonstration


