Skip to content

daniel-lima-lopez/Information-Retrieval-Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information-Retrieval-Example

This repository presents an example of an Information Retrieval application, which consists of a command-based joke browser implementation. The dataset used, Short Jokes, contains over 200,000 short jokes on different topics. The authors indicate that, since data is collected by web scraping, inappropriate or offensive jokes are likely to be found.

Installation

Clone this repository:

git clone git@github.com:daniel-lima-lopez/Information-Retrieval-Example.git

move to installation directory:

cd Information-Retrieval-Example

install Natural Language Toolkit:

pip install nltk

Method description

Information retrieval is finding material that satisfies an information need. In this case, given a corpus of jokes, the information need is expressed by tokens or query words and logical operators (AND, OR, NOT).

To perform this task, it is constructed a dictionary of terms, which contains every word in the corpus of jokes. Then, for each term, we have a set, denominated posting, that contains the indexes of the jokes where the term appears, this technique is called inverted index.

Once all postings are constructed, we can perform queries with terms of interest and logical operators.

Examples of use

The following examples can be executed in the notebook examples.ipynb.

We can perform simple commands executing IR.py:

python IR.py

and introduce a valid command, considering common terms and logical operations in lower case (and, or, not):

Enter a valid command: dog or cat and funny and not smoke

Results:
 - You know what cats don't like? Blow dryers. You know what's funny? Pointing your blow dryer at your cat. Anyway, I lost an eye today.
 - I have a dog named Hot-Dog. Isn't funny? hahahaha....
 - I use my neighbor's outdoor jacuzzi for bubble bath time with my cat. I'd invite him, but my cat's funny about bathing with strangers.
 - What diagnosis did the veterinarian give to the dog with the funny walk? The dog has cerebral pawlsy.
 - No wonder my cigar tastes funny. It's just a really old hot dog.
 - A funny thing to do when someone's dog barks at you is say, "I don't speak dog," and then when they leave the room, speak dog fluently.
 - Bad news, guys. Throwing a cat through a wall doesn't make a funny, cat-shaped hole. Not even close.
 - My 1-year-old thought it was funny to put food in my mouth. It was cute with Skittles. Then she switched to dog food.
 - My Dad's cat had a hernia operation The cat was laying there next to next to me and I asked " What did they sew you up with?" My Dad laughing so hard - as he said "That's not funny!" [Cat Gut]
 - Funny how we say "I drank a *pot* of coffee" instead of "I drank fourteen cups of coffee and chased the cat around the hot tub with a sword"
 - Only funny if you own a dog: I think my dog must have a very cold nose. Every time it walks into a room, all the other dogs sit down.

It is possible to perform more complex commands with the Searcher class. For example, the command (computer or laptop) and (sad or amazing) can be executed by parts with set operations:

from IR import Searcher

test = Searcher() # instantiate the Searcher class
ind1 = test.filter('computer or laptop') # first command
ind2 = test.filter('sad or amazing') # second command
jokes = test.get_jokes(ind1 & ind2) # union of commands

print('Results:')
for ji in jokes:
    print(f' - {ji}')

which results in:

Results:
 - Why is the Computer D Drive always sad? D:
 - Today was a sad day - we had to pull the plug on my granpa cause I needed the outlet for my laptop
 - Why was the middle aged computer sad? He had a floppy disc.
 - What do you call a computer with an amazing singing voice? A Dell.
 - I should go outside and enjoy the amazing weather but my computer cord isn't long enough.
 - It's amazing how quickly reheated food in the microwave goes cold again when you think you're only going to be on the computer for a moment.
 - What kind of computer is optimized for sad songs? A Dell.
 - What do you call a computer that only plays sad songs? Adele

About

A practical example of information retrieval is presented, which searches for jokes from keyword (tokens) commands.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors