Text-analysis-pipeline

This repository contains a comprehensive text analysis pipeline developed in Python for extracting insights from a collection of URLs.

extraction.py

The extraction.py script is responsible for extracting text content from a list of URLs. It handles cases where 404 errors are encountered and generates an error log containing the URLs with 404 errors.

Running extraction.py:

Install Python: Ensure you have Python installed on your system. If not, download and install the latest version of Python.
Open Terminal/Command Prompt: Launch your terminal or command prompt.
Navigate to File Directory: Use the cd command to navigate to the directory where the extraction.py file is located.
Execute Script: Run the script by entering the following command:
```
python extraction.py
```
Complete Execution: Allow the script to complete the data extraction process. Make sure all necessary dependencies are installed prior to execution.

analysis.py

The analysis.py script utilizes the extracted text data to perform sentiment analysis, calculate readability metrics, and generate an output Excel file with insightful metrics for further analysis.

Running analysis.py:

Install Required Libraries: Ensure that all necessary Python libraries and dependencies (such as pandas, textblob, nltk, etc.) are installed on your system.
Open Terminal/Command Prompt: Launch your terminal or command prompt.
Navigate to File Directory: Use the cd command to navigate to the directory where the analysis.py file is located.
Execute Script: Run the script by entering the following command:
```
python analysis.py
```
Wait for Completion: Allow the script to run and complete the analysis process. Make sure that any required data files, such as input data or text files, are available in the specified directories as indicated within the script.

Following these instructions will help you to successfully execute the extraction.py and analysis.py scripts and perform the desired data extraction and analysis tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Input.xlsx		Input.xlsx
Output.xlsx		Output.xlsx
README.md		README.md
analysis.py		analysis.py
extraction.py		extraction.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-analysis-pipeline

extraction.py

Running extraction.py:

analysis.py

Running analysis.py:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-analysis-pipeline

extraction.py

Running extraction.py:

analysis.py

Running analysis.py:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages