Skip to content

Gather more data for the chatbot's database via publicly available datasets #52

@janakrajchadha

Description

@janakrajchadha

Requirement
The sentences.csv file has very limited data which can be used for the initial training. The aim is to gather more data via publicly available datasets and sources to help improve the responses of the bot via ML models.

Pre-requisite

  • Elementary knowledge of Python
  • Elementary understanding of the available data

Dependencies
None

Description
This is an open-ended issue where participants can explore various sources to gather the data required for improving the bot's NLP capabilities. Depending on the data, it may or may not require some elementary pre-processing before getting added to the available data. A separate issue might be created for the pre-processing if needed later.

A good point to start here would be to look for common conversation examples like 'Hello', 'How're you', 'That's good to hear' which are labelled as 'C' in the sentences.csv file. Looking for data based on the different labels might be easier.

Metadata

Metadata

Labels

easyEasy level issue GSSoC 2020enhancementgood first issuegssoc20Issues to be picked up by participants during GSSoC 2020

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions