Predicting material type used for publishing

To predict the material type of the to-be published research based on the document. Below are the different material types where the data should be published:

Book
Sound disk
Video Cassette
Sound Cassette
Music
Mixed
CR

The dataset has been divided into train and test already.

Train Data has 31,653 rows and 12 columns.
Test Data has XXX rows and XXX columns.

Below given are the details about the different columns of dataset.

Exploratory Data Analysis (EDA)

Below are some of the points that we found from the data :

The full data is from UsageClass=physical, checkouttype=Horizon, and checkoutyear=2005, checkoutmonth=4
No duplicate data is present in the data.
Combination of Subject and title should be used to identify the materialtype.
~ 69% of the total data are having materialtype as BOOK, followed by SOUNDDISC(~ 13%) and VIDEOCASS(~ 9%).
Since majority of the data has materialtype as BOOK, if we use this data as it is for model building, it will become imbalanced. For more details Click on Exploratory Data Analysis

ML Model Creation

First the data has been splitted into training and testing set and the preprocessed/cleaned. Then the data has been converted to TF-IDF matrix post which it has been fed into different machine learning algorithms. SVM and Random forest Classifier seemed to have the best accuracy, recall and precission score. Overall accuracy acheived is ~88%. For more details click on Model Creation

End Notes

I am always open to suggestions, hence please provide if you have any. Show some love if you have benfitted from this content by starring the content or by following me.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
LICENSE		LICENSE
README.md		README.md
eda.ipynb		eda.ipynb
model.ipynb		model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting material type used for publishing

Exploratory Data Analysis (EDA)

ML Model Creation

End Notes

📬 Find me on

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicting material type used for publishing

Exploratory Data Analysis (EDA)

ML Model Creation

End Notes

📬 Find me on

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages