Skip to content

arnabpal2022/UniEmbed-SQLi-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation


UniEmbed: SQL Injection Attack Detection Using Multiple Feature Fusion

This project implements UniEmbed, a unified approach for detecting SQL injection attacks through the fusion of advanced Natural Language Processing techniques and Machine Learning classifiers. Based on the research UniEmbed: A Novel Approach to Detect XSS and SQL Injection Attacks Leveraging Multiple Feature Fusion with Machine Learning Techniques (Bakır, 2025), this repository provides a comprehensive, reproducible analysis in Jupyter Notebook style.

Project Highlights

  • Multi-Feature Embedding: Simultaneously leverages Word2Vec, FastText, and the Universal Sentence Encoder (USE) to extract, enrich, and combine semantic representations of SQL queries.
  • State-of-the-Art Modeling: Trains a suite of ML classifiers including MLP, Random Forest, SVM, Logistic Regression, KNN, Naive Bayes, Decision Tree, and voting ensembles.
  • Rigorous Evaluation: Assesses detection performance using standard metrics (accuracy, F1, AUC, etc.) and visualization (ROC, confusion matrices).
  • Extensible Framework: Clean, modular Python code ready for expansion to XSS or other text-based attacks.

Table of Contents


Overview

Traditional web application security approaches often fail to detect sophisticated, obfuscated, or novel forms of SQL injection attacks. The UniEmbed method fuses state-of-the-art NLP embedding strategies—Word2Vec (word-level), FastText (character-level), and USE (sentence-level)—to capture both surface and semantic patterns, vastly improving model learning and attack detection.

Dataset

You will need the SQL Injection dataset by SAJID576 from Kaggle. Download the CSV file and place it in the root directory of the project or specify the correct path in the notebook.

Sample Format:

Sentence Label
SELECT * FROM users WHERE id = 1 0
SELECT * FROM users WHERE id = 1 OR 1=1-- 1
... ...
  • 0 = benign query
  • 1 = malicious (SQL injection) query

Project Structure

.
├── UniEmbed_SQLi.ipynb       # Main Jupyter notebook implementation
├── sqli_dataset.csv          # Place your Kaggle dataset here
├── models/                   # Trained embedding models (saved after run)
├── results/                  # Evaluation results and artifacts
├── README.md                 # You're reading this!

Requirements

  • Python 3.8 or newer
  • pandas, numpy, scikit-learn, gensim, matplotlib, seaborn
  • tensorflow, tensorflow-hub (for Universal Sentence Encoder)

Install them using:

pip install pandas numpy scikit-learn gensim matplotlib seaborn tensorflow tensorflow-hub

How To Use

  1. Download the dataset:
    Download the "SQL Injection" dataset from Kaggle and place it in your project directory.

  2. Open and run the notebook:
    Open UniEmbed_SQLi_Detection.ipynb in Jupyter Notebook or JupyterLab, and run each cell in order.

  3. Configure paths:
    If your dataset is not named SQLi_Dataset.csv or is located elsewhere, change the path at the data loading step.

  4. Explore the results:
    The notebook will output accuracy, F1, confusion matrices, ROC curves, and compare all feature extraction methods and classifiers.

Key Results

  • UniEmbed Fusion: Outperforms individual embedding techniques (Word2Vec, FastText, USE) on SQL injection detection metrics.
  • MLP & Voting Classifiers: Achieve exceptionally high accuracy and F1 scores with almost zero false positives/negatives in experiments mimicking the published paper.
  • Visualization: Provides immediate understanding of classifier performance with clear, publication-ready plots.

Reference

Bakır, R. (2025). UniEmbed: A Novel Approach to Detect XSS and SQL Injection Attacks Leveraging Multiple Feature Fusion with Machine Learning Techniques. Arabian Journal for Science and Engineering.

Acknowledgments

  • SAJID576 for the open SQL injection dataset.
  • Tensorflow and Gensim teams for state-of-the-art NLP tools.
  • The original author for describing the hybrid feature fusion approach in detail.

Enjoy exploring the frontier of secure web application research with UniEmbed!


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors