UniEmbed: SQL Injection Attack Detection Using Multiple Feature Fusion

This project implements UniEmbed, a unified approach for detecting SQL injection attacks through the fusion of advanced Natural Language Processing techniques and Machine Learning classifiers. Based on the research UniEmbed: A Novel Approach to Detect XSS and SQL Injection Attacks Leveraging Multiple Feature Fusion with Machine Learning Techniques (Bakır, 2025), this repository provides a comprehensive, reproducible analysis in Jupyter Notebook style.

Project Highlights

Multi-Feature Embedding: Simultaneously leverages Word2Vec, FastText, and the Universal Sentence Encoder (USE) to extract, enrich, and combine semantic representations of SQL queries.
State-of-the-Art Modeling: Trains a suite of ML classifiers including MLP, Random Forest, SVM, Logistic Regression, KNN, Naive Bayes, Decision Tree, and voting ensembles.
Rigorous Evaluation: Assesses detection performance using standard metrics (accuracy, F1, AUC, etc.) and visualization (ROC, confusion matrices).
Extensible Framework: Clean, modular Python code ready for expansion to XSS or other text-based attacks.

Overview

Traditional web application security approaches often fail to detect sophisticated, obfuscated, or novel forms of SQL injection attacks. The UniEmbed method fuses state-of-the-art NLP embedding strategies—Word2Vec (word-level), FastText (character-level), and USE (sentence-level)—to capture both surface and semantic patterns, vastly improving model learning and attack detection.

Dataset

You will need the SQL Injection dataset by SAJID576 from Kaggle. Download the CSV file and place it in the root directory of the project or specify the correct path in the notebook.

Sample Format:

Sentence	Label
`SELECT * FROM users WHERE id = 1`	0
`SELECT * FROM users WHERE id = 1 OR 1=1--`	1
...	...

0 = benign query
1 = malicious (SQL injection) query

Project Structure

.
├── UniEmbed_SQLi.ipynb       # Main Jupyter notebook implementation
├── sqli_dataset.csv          # Place your Kaggle dataset here
├── models/                   # Trained embedding models (saved after run)
├── results/                  # Evaluation results and artifacts
├── README.md                 # You're reading this!

Requirements

Python 3.8 or newer
pandas, numpy, scikit-learn, gensim, matplotlib, seaborn
tensorflow, tensorflow-hub (for Universal Sentence Encoder)

Install them using:

pip install pandas numpy scikit-learn gensim matplotlib seaborn tensorflow tensorflow-hub

How To Use

Download the dataset:
Download the "SQL Injection" dataset from Kaggle and place it in your project directory.
Open and run the notebook:
Open UniEmbed_SQLi_Detection.ipynb in Jupyter Notebook or JupyterLab, and run each cell in order.
Configure paths:
If your dataset is not named SQLi_Dataset.csv or is located elsewhere, change the path at the data loading step.
Explore the results:
The notebook will output accuracy, F1, confusion matrices, ROC curves, and compare all feature extraction methods and classifiers.

Key Results

UniEmbed Fusion: Outperforms individual embedding techniques (Word2Vec, FastText, USE) on SQL injection detection metrics.
MLP & Voting Classifiers: Achieve exceptionally high accuracy and F1 scores with almost zero false positives/negatives in experiments mimicking the published paper.
Visualization: Provides immediate understanding of classifier performance with clear, publication-ready plots.

Reference

Bakır, R. (2025). UniEmbed: A Novel Approach to Detect XSS and SQL Injection Attacks Leveraging Multiple Feature Fusion with Machine Learning Techniques. Arabian Journal for Science and Engineering.

Acknowledgments

SAJID576 for the open SQL injection dataset.
Tensorflow and Gensim teams for state-of-the-art NLP tools.
The original author for describing the hybrid feature fusion approach in detail.

Enjoy exploring the frontier of secure web application research with UniEmbed!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
UniEmbed_SQLI_Detection.ipynb		UniEmbed_SQLI_Detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniEmbed: SQL Injection Attack Detection Using Multiple Feature Fusion

Project Highlights

Table of Contents

Overview

Dataset

Project Structure

Requirements

How To Use

Key Results

Reference

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UniEmbed: SQL Injection Attack Detection Using Multiple Feature Fusion

Project Highlights

Table of Contents

Overview

Dataset

Project Structure

Requirements

How To Use

Key Results

Reference

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages