Temporary deepcs model for constrcuting a code search plugin
PyTorch implementation of Deep Code Search.
Tested in MacOS 10.12, Ubuntu 16.04
- Python 3.6
- PyTorch
- tqdm
pip install -r requirements.txt
models: neural network models for code/desc representation and similarity measure.modules.py: basic modules for model construction.train.py: train and validate code/desc representaton models;repr_code.py: encode code into vectors and store them to a file;search.py: perform code search;configs.py: configurations for models defined in themodelsfolder. Each function defines the hyper-parameters for the corresponding model.data_loader.py: A PyTorch dataset loader.utils.py: utilities for models and training.
If you want a quick test, here is a pretrained model. Put it in ./output/JointEmbeder/github/202106140524/models/ and run:
python repr_code.py -t 202106140524 --reload_from 4000000
python search.py -t 202106140524 --reload_from 4000000
The /data folder provides a small dummy dataset for quick deployment.
To train and test our model:
-
Download and unzip real dataset from Google Drive or Baidu Pan for Chinese users.
-
Replace each file in the
/datafolder with the corresponding real file.
Edit hyper-parameters and settings in config.py
python train.py --model JointEmbeder -vpython repr_code.py --model JointEmbeder -t XXX --reload_from YYYwhere XXX stands for the timestamp, and YYY represents the iteration with the best model.
python search.py --model JointEmbeder -t XXX --_reload_from YYYwhere XXX stands for the timestamp, and YYY represents the iteration with the best model.
Here is a screenshot of code search:
If you find it useful and would like to cite it, the following would be appropriate:
@inproceedings{gu2018deepcs,
title={Deep Code Search},
author={Gu, Xiaodong and Zhang, Hongyu and Kim, Sunghun},
booktitle={Proceedings of the 2018 40th International Conference on Software Engineering (ICSE 2018)},
year={2018},
organization={ACM}
}
