This repository contains a machine learning pipeline for predicting interest levels in rental listings as part of the Two Sigma Connect: Rental Listing Inquiries challenge.
Install the required dependencies:
pip install pandas numpy scikit-learn xgboost matplotlib tqdmFor deep learning models, additional dependencies are required:
pip install torch tensorflow transformersGenerate the feature dataset (this is a critical step and must be run first):
python run_features.pyTrain and evaluate traditional machine learning models:
python run.py --model xgb # Train XGBoost model
python run.py --model rf # Train Random Forest model
python run.py --model dt # Train Decision Tree model
python run.py --model ridge # Train Ridge Regression
python run.py --model lasso # Train Lasso Regression
python run.py --model elasticnet # Train ElasticNet model
python run.py --model svm # Train SVM model
python run.py --model knn # Train KNN model
python run.py --model logistic # Train Logistic Regression modelFor deep learning models, run the specific Python files directly:
python src/models/neural_network.py # Basic Neural Network
python src/models/advanced_nn.py # Advanced Neural Network
python src/models/transformer_model.py # Transformer Model
python src/models/cnn_model.py # Convolutional Neural Network
python src/models/rnn_model.py # Recurrent Neural NetworkTrain a model and generate prediction results:
python run.py --model xgb --predict # Train XGBoost and generate predictionsCompare the performance of multiple models:
python run.py --compare --models xgb rf logistic knnTrain an ensemble model:
# Average ensemble method
python run.py --ensemble --models xgb rf logistic
# Weighted ensemble method
python run.py --ensemble --ensemble-method weighted --models xgb rf
# Train and predict with ensemble model
python run.py --ensemble --predict| Abbreviation | Model Name | Features |
|---|---|---|
| xgb | XGBoost | Gradient boosting trees, handles complex relationships |
| rf | Random Forest | Ensemble of decision trees, good stability |
| dt | Decision Tree | Simple, transparent, easy to interpret |
| logistic | Logistic Regression | Linear model with probability output |
| ridge | Ridge Regression | L2 regularized linear model |
| lasso | Lasso Regression | L1 regularized, feature selection |
| elasticnet | ElasticNet | L1+L2 regularized regression |
| svm | Support Vector Machine | Powerful non-linear classification |
| knn | K-Nearest Neighbors | Similarity-based simple classification |
| nn | Neural Network | Basic feedforward neural network |
| adv_nn | Advanced Neural Network | Deeper architecture with advanced training |
| transformer | Transformer Model | Attention-based architecture |
| cnn | Convolutional Neural Network | Image-inspired architecture |
| rnn | Recurrent Neural Network | Sequence modeling architecture |
Two ensemble learning methods are supported:
- Average: Equal weight averaging of predictions from multiple models
- Weighted: Dynamic weight allocation based on cross-validation performance
Each experiment creates a timestamped folder in the outputs/ directory containing:
- Trained model
- Prediction results
- Detailed evaluation metrics (loss, F1 scores, etc.)
- Cross-validation results
- Training metadata (time, duration, etc.)
Edit the config.py file to modify:
- Data paths
- Random seed
- Cross-validation folds
- Test set proportion
- Model hyperparameters
├── run.py # Main execution script
├── run_features.py # Feature engineering script
├── config.py # Configuration settings
├── src/
│ ├── features/ # Feature generation modules
│ ├── models/ # Model implementations
│ │ ├── xgb_model.py # XGBoost implementation
│ │ ├── rf_model.py # Random Forest implementation
│ │ ├── neural_network.py # Neural Network implementation
│ │ └── ...
│ ├── utils/ # Utility functions
│ └── evaluation/ # Evaluation metrics
├── data/ # Data directory containing input files
└── outputs/ # Model outputs and results