Skip to content

Python toolkit for Financial Machine Learning, streamlining data handling, feature engineering, model development, and performance analysis in financial time series.

License

Notifications You must be signed in to change notification settings

a-dorgham/FinML-Toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinML-Toolkit

Python TensorFlow Plotly License Status Contributions


FinML-Toolkit

FinML-Toolkit is a comprehensive Python library designed to streamline the full workflow of applying Machine Learning (ML) to financial time series data. It provides a modular and extensible framework for:

  • Data handling
  • Feature engineering
  • Model training & evaluation
  • Signal generation
  • Interactive visualization

Features

  • Robust Data Handling: Load and filter financial time series from .pkl files using date ranges.
  • Advanced Feature Engineering: Add technical indicators (e.g., RSI, MACD), volume analysis, and more.
  • Flexible ML Architectures:
    • LSTM, GRU, Conv1D, Attention, Transformer (via TensorFlow/Keras)
    • RandomForest, XGBoost, CatBoost, LightGBM (via scikit-learn & boosting libraries)
  • Imbalanced Data Handling: SMOTE, ADASYN, NearMiss, TomekLinks, SMOTETomek, SMOTEENN, etc.
  • Class Weight Optimization: Optimize with scipy.optimize.minimize (Nelder-Mead, SLSQP, etc.)
  • Model Evaluation: Accuracy, precision, recall, F1-score, MSE, RMSE, R², confusion matrix.
  • Interactive Plotting: Plotly-based heatmaps, forecasting charts, actual vs. predicted lines/candlesticks, and trading signal visualizations.
  • Modular Design: Clean structure with dedicated modules for each pipeline stage.

Project Structure

FinML-Toolkit/
├── setup.py
├── examples/
│   ├── Basic_ML_Model_Training.ipynb
│   ├── ...
├── fin_data/
│   ├── EUR_USD_H4.pkl
│   └── ...
└── ml_toolkit/
    ├── __init__.py
    ├── data_handler.py
    ├── error_handler.py
    ├── feature_engineer.py
    ├── model_builder.py
    ├── model_evaluator.py
    ├── model_executor.py
    ├── model_forecaster.py
    └── visualizer.py

Getting Started

Prerequisites

  • Python 3.8+
  • Recommended: Use a virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate  # Windows

Installation

From the project root (where setup.py is):

pip install -e .

Usage Overview

Load & Prepare Data

from ml_toolkit.data_handler import DataHandler
df = DataHandler.load_data('GBP_USD_H4.pkl', start_date="2020-01-01", end_date="2024-07-29 21:00")

Feature Engineering

from ml_toolkit.feature_engineer import FeatureEngineer
df, time, X, y = FeatureEngineer.add_features(df=df, features=['RSI', 'MACD', 'Volume'])

Build & Train Model

from ml_toolkit.model_builder import ModelBuilder
input_shape = (X.shape[1], X.shape[2]) if X.ndim == 3 else (X.shape[1],)
model = ModelBuilder.create_lstm_model(input_shape=input_shape, output_units=3, output_activation='softmax')

Execute & Evaluate

from ml_toolkit.model_executor import ModelExecutor
model, acc, prec, rec, f1, conf, mse, rmse = ModelExecutor.execute_model(
    file_path='GBP_USD_H4.pkl',
    start_date="2020-01-01",
    end_date="2024-07-29 21:00",
    features=['RSI', 'MACD', 'Volume'],
    model_type='LSTM',
    epochs=10
)

Visualization

from ml_toolkit.visualizer import Visualizer
# visualizer = Visualizer()
# visualizer.plot_actual_vs_predicted_lines(y_test, y_pred)

Example Notebooks

  • Basic_ML_Model_Training.ipynb
  • Basic_Regression_Model_Training.ipynb
  • Class_Weight_Optimization.ipynb
  • Feature_Selection_Optimization.ipynb
  • Imbalanced_Dataset_Evaluation.ipynb
  • Correlation_Heatmap_Visualization.ipynb
  • Peak_Detection_Visualization.ipynb
  • Trades_Evaluation.ipynb

Model Output

Sample output from the notebook (note that predictions for M15 are far better than H4):

  • Correlation maps visualization image

  • Peaks detection

image
  • Actual vs predicted signals image

  • Class classification image

  • Confusion matrix

image
  • Debugging report
image
  • Trades log
image
  • Trades metrics
image

Developer Notes

  • Modular, extensible codebase.
  • Add your own model logic in model_builder.py.
  • Financial data expected in .pkl format in fin_data/.

Known Limitations

  • Custom indicators must be added manually.
  • Optimization methods may need tuning.
  • Predictions at shorter time scales, e.g. M15, are far better than those at longer time scales, e.g. H4.

Roadmap

  • Live data feed integration (e.g., Oanda, Binance)
  • Backtesting module
  • SHAP/LIME for explainability
  • More advanced forecasting models
  • Model deployment utilities

Contributing

git clone https://github.com/a-dorgham/FinML-Toolkit.git
cd FinML-Toolkit
# Create a branch and submit a PR

License

MIT License


Contact