Welcome to our project! We are Neil de la Fuente, Nil Biescas, Xavier Soto, Jordi Longaron, and Daniel Vidal, and we have joined forces to revolutionize the way iDisc, a translation company, assigns tasks to its translators.
- Project Overview
- Repository Structure
- Data
- Installation and Usage
- Performance
- How to Contribute
- Want to know more?
- Contact
Our mission is to assist project managers at iDisc in making task assignments more efficient and effective. To achieve this, we have developed several machine learning models, including a Random Forest with Decision Trees and a Multilayer Perceptron (MLP). These models take into account various factors such as previous tasks completed by translators, client preferences, and features of the task at hand. The output is a list of top-k candidates best suited for a given task, making the assignment process streamlined and informed.
Decision_Trees: This directory contains Jupyter notebooks for the models we've built using decision trees. The notebooks included are "DecisionTrees_synthesis.ipynb" and "randomforest_synthesis.ipynb".Models: This directory contains the model used to trained the MLP.CheckPoints: This directory contains checkpoint files of the different models we've experimented with, each having its unique configuration, such as batch sizes and the use of dropout techniques.Utils: Inside this directory you will find three files:
1. Utils.py used to obtain the dataloader
2. organaizer.py used to organize the training and validation of the model
3. utils_Dataset.py used to preprocess all the data from the .pkl file
TKinter: This directory contains a python file using tkinter to create the interface of the project. For the in depth explanation access the folder.
Here you have a link for the data needed for each of the models (Might be different data due to the difference between decision trees and neural networks):
Before the data is fed into our models, it undergoes a thorough preprocessing. This includes cleaning, normalization, and feature extraction, ensuring that our models receive quality data that helps them make the best predictions.
Before starting with the usage, ensure Python 3.x is installed on your system. If it is not, you can download it here. Next, clone the project from GitHub to your local machine using the command:
git clone https://github.com/NilBiescas/Synthesis_Project.git
To run the program you will need to do update the path to the data downloaded for the MLP. The variable that will need to be changed is found in the main.py file and it is name pkl_file.
python main.py
Just download the notebooks, upload the data and run all the cells, yes, it´s that easy!
Our models have shown promising results in optimizing the task assignment process. The Decision Tress, The Random Forest model and the MLP model achieved the following performance:
| Model | Accuracy | Recall | F1-Score |
|---|---|---|---|
| Decision Trees | 71% | 68% | 69% |
| Random Forest | 82% | 79% | 80% |
| MLP | 84% | 81% | 81% |
The Multilayer Perceptron (MLP) model is achieving better performance primarily due to its higher complexity and better capacity to model intricate non-linear relationships, something that gives MLP an edge when dealing with complex task assignment data. Its learning method, backpropagation, allows it to learn from its errors, incrementally improving its performance as it processes more data. Additionally, MLPs tend to perform better with high-dimensional data, particularly when there are sophisticated interactions between features. These qualities make it adept at handling the complexity of our dataset, contributing to its superior performance in comparison to the Decision Trees. The DT's on the other hand are a proof that a simple model also can work quite well, in this case thanks to the unbiased approaach they have. Finally, and with similar results comparing it to the MLP we have the Random Forest, which is a ensemble learning method based on voting, it mixes the result of several trees to provide a more consistent and confident response. The performance measures are based on the accuracy of the task assignment. We continue to improve and optimize these models. A deeper analysis will overgo on the report.
We welcome contributions! If you're interested in improving our models, fixing bugs, or adding new features, please feel free to make a pull request.
Soon the report on the project will be available for you to have a deeper understanding of our work. Stay tuned for updates!
For any inquiries or issues, feel free to reach out to us:
