This framework aims at recommending items to users based on the similarities of items' descriptions and users' requests. An api is provided to make this framework embedded in back-end.
Rememeber to alter parameters and constants if you are using your own source data files. Usually the file is in below format
| id | name | description |
|---|---|---|
| 01 | Wine | Made in China |
[toc]
- Compatible with python2.7.x and python3.5.x (python3 is preferred)
- Optimized for quick response (Usually within 2 seconds)
- Parallel model training
- File parser for external files in
.txt/.pdf/.pptx/.doc/.docxformat.
- Implementation of various sentence/paragraph encoding models
- Unsupervised models
- average of word2vec
- max of word2vec
- lda topics
- word mover distance(paper: "From Word Embeddings To Document Distances")
- topic word embedding(paper: "Topical Word Embeddings")
- Supervised models
- seq2seq model(paper: "The Ubuntu Dialogue Corpus- A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems")
- Unsupervised models
- An api for depolying on server
pip install -r requirements.txtDownload stopwords to local disk by entering python in terminal window and using nltk.download()
If needed, change constants (e.g. cpu number, server port, word2vec/lda model parameters) in constants.py. BTW, model parameters have already been tuned.
From my experience, these parameters are vital important to models:
TrainFiles,VEC_SIZE,W2V_TYPE,TOPICS_NUM,PASSES,WORD_DOC_FREQ,
Run server.py on server. Will create a website like http://[your ip address]:5000/ and an API like http://[your ip address]:5000/problem?text=
Input a text string through html's text input box, or through API (e.g. http://[your ip address]:5000/problem?text=Detect fraud and cheat in ledger), the API will send recommendation results in json format
- code
(see comments in each code file for details)
server.py: Build an API and start a http servertemplates: Html and js file.rec_engine.py: Do recommendationtrain.py: Train word2vec/lda model. Save models in model foldermy_utils.py: General functions. Including text processing, corpus generation, etc.constants.py: Set global parametersfile_parser.py: Extract text from documents- refrences
upper.py: Use flask to build a server API (Jason sent me)bluemix_nlc_utils.py: Configure bluemix NLC API in python (Jason sent me)
- data: Empty. For storing data files
- model: Empty. For stroing models
- docs
*.html: Auto generated docs for code modules.