Skip to content

Car-pe/ML_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rental Listing Interest Prediction

This repository contains a machine learning pipeline for predicting interest levels in rental listings as part of the Two Sigma Connect: Rental Listing Inquiries challenge.

Installation

Install the required dependencies:

pip install pandas numpy scikit-learn xgboost matplotlib tqdm

For deep learning models, additional dependencies are required:

pip install torch tensorflow transformers

Usage

Feature Engineering

Generate the feature dataset (this is a critical step and must be run first):

python run_features.py

Model Training and Evaluation

Train and evaluate traditional machine learning models:

python run.py --model xgb             # Train XGBoost model
python run.py --model rf              # Train Random Forest model
python run.py --model dt              # Train Decision Tree model
python run.py --model ridge           # Train Ridge Regression
python run.py --model lasso           # Train Lasso Regression
python run.py --model elasticnet      # Train ElasticNet model
python run.py --model svm             # Train SVM model
python run.py --model knn             # Train KNN model
python run.py --model logistic        # Train Logistic Regression model

For deep learning models, run the specific Python files directly:

python src/models/neural_network.py       # Basic Neural Network
python src/models/advanced_nn.py          # Advanced Neural Network
python src/models/transformer_model.py    # Transformer Model
python src/models/cnn_model.py            # Convolutional Neural Network
python src/models/rnn_model.py            # Recurrent Neural Network

Model Training and Prediction

Train a model and generate prediction results:

python run.py --model xgb --predict   # Train XGBoost and generate predictions

Model Comparison

Compare the performance of multiple models:

python run.py --compare --models xgb rf logistic knn

Ensemble Models

Train an ensemble model:

# Average ensemble method
python run.py --ensemble --models xgb rf logistic

# Weighted ensemble method
python run.py --ensemble --ensemble-method weighted --models xgb rf

# Train and predict with ensemble model
python run.py --ensemble --predict

Supported Models

Abbreviation Model Name Features
xgb XGBoost Gradient boosting trees, handles complex relationships
rf Random Forest Ensemble of decision trees, good stability
dt Decision Tree Simple, transparent, easy to interpret
logistic Logistic Regression Linear model with probability output
ridge Ridge Regression L2 regularized linear model
lasso Lasso Regression L1 regularized, feature selection
elasticnet ElasticNet L1+L2 regularized regression
svm Support Vector Machine Powerful non-linear classification
knn K-Nearest Neighbors Similarity-based simple classification
nn Neural Network Basic feedforward neural network
adv_nn Advanced Neural Network Deeper architecture with advanced training
transformer Transformer Model Attention-based architecture
cnn Convolutional Neural Network Image-inspired architecture
rnn Recurrent Neural Network Sequence modeling architecture

Ensemble Methods

Two ensemble learning methods are supported:

  1. Average: Equal weight averaging of predictions from multiple models
  2. Weighted: Dynamic weight allocation based on cross-validation performance

Experiment Results

Each experiment creates a timestamped folder in the outputs/ directory containing:

  • Trained model
  • Prediction results
  • Detailed evaluation metrics (loss, F1 scores, etc.)
  • Cross-validation results
  • Training metadata (time, duration, etc.)

Custom Configuration

Edit the config.py file to modify:

  • Data paths
  • Random seed
  • Cross-validation folds
  • Test set proportion
  • Model hyperparameters

Project Structure

├── run.py                  # Main execution script
├── run_features.py         # Feature engineering script
├── config.py               # Configuration settings
├── src/
│   ├── features/           # Feature generation modules
│   ├── models/             # Model implementations
│   │   ├── xgb_model.py    # XGBoost implementation
│   │   ├── rf_model.py     # Random Forest implementation
│   │   ├── neural_network.py  # Neural Network implementation
│   │   └── ...
│   ├── utils/              # Utility functions
│   └── evaluation/         # Evaluation metrics
├── data/                   # Data directory containing input files
└── outputs/                # Model outputs and results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published