Skip to content

Anjihee/Sign2Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

67 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Sign2Text: Real-time Sign Language Recognition with Subtitle Generation


๐Ÿ“Œ Overview

Sign2Text is a real-time sign language recognition system that captures webcam input and displays corresponding words as subtitles.
It is designed to assist communication by recognizing hand gestures using deep learning and computer vision.

๐Ÿ”ง Key Features

  • Auto prediction without button press
  • MediaPipe-based hand keypoint extraction
  • Conv1D + BiLSTM sequence classification
  • Top-3 predictions with confidence scores
  • Temperature scaling for stable outputs
  • Real-time PyQt5 GUI (no keyboard interaction)
  • Data augmentation & evaluation support

๐Ÿงฉ Development Environment

  • OS & Tools:

    • Windows 10 / macOS (Apple Silicon)
    • Anaconda (Python 3.8.20)
    • Visual Studio Code
  • Core Libraries:

    • numpy==1.24.3
    • pandas==2.0.3
    • opencv-python==4.11.0.86
    • mediapipe==0.10.11
    • scipy==1.10.1
    • tqdm==4.64.1
    • pillow==10.4.0
    • tensorflow==2.13.0
    • keras==2.13.1
    • PyQt5==5.15.9

๐Ÿ‘ฅ Team Members

Role Name (GitHub) Responsibility Details
๐Ÿง‘โ€๐Ÿ’ผ Team Lead An Jihee Modeling, Real-time Inference System Built Conv1D + BiLSTM model and PyQt5-based GUI for real-time sign recognition, Conducted sequence length testing
๐Ÿ‘ฉโ€๐Ÿ’ป Member Kim Minseo Data Collection, Preprocessing, Evaluation Extracted raw keypoints, constructed labeled CSVs, and participated in testing
๐Ÿ‘ฉโ€๐Ÿ’ป Member Lee Jimin Data Augmentation, Evaluation Generated augmented sequences and conducted hold-out testing

๐Ÿ“ Project Structure

Sign2Text/
โ”œโ”€โ”€ dataset/
โ”‚   โ”œโ”€โ”€ npy/
โ”‚   โ””โ”€โ”€ augmented_samples/
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ L10/, L20/, ...
โ”‚       โ”œโ”€โ”€ sign_language_model_normalized.h5
โ”‚       โ”œโ”€โ”€ label_classes.npy
โ”‚       โ”œโ”€โ”€ X_mean.npy
โ”‚       โ””โ”€โ”€ X_std.npy
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ dataset_preprocessing/
โ”‚   โ”‚   โ”œโ”€โ”€ add_angles_to_merged.py
โ”‚   โ”‚   โ”œโ”€โ”€ batch_generate_csv.py
โ”‚   โ”‚   โ”œโ”€โ”€ create_total_seq.py
โ”‚   โ”‚   โ”œโ”€โ”€ merge_csv.py
โ”‚   โ”‚   โ””โ”€โ”€ zip_csv.py
โ”‚   โ”œโ”€โ”€ hold_out_test/
โ”‚   โ”‚   โ”œโ”€โ”€ holdout_test.py         # Run predictions on unseen samples
โ”‚   โ”‚   โ”œโ”€โ”€ auto_infer.py
โ”‚   โ”‚   โ”œโ”€โ”€ make_test_labels.py
โ”‚   โ”‚   โ”œโ”€โ”€ holdout_results.csv
โ”‚   โ”‚   โ””โ”€โ”€ test_labels.csv
โ”‚   โ”œโ”€โ”€ predict/
โ”‚   โ”‚   โ”œโ”€โ”€ predict_test_sample.py
โ”‚   โ”‚   โ”œโ”€โ”€ predict_test_sample_normalized.py
โ”‚   โ”‚   โ””โ”€โ”€ label_similarity_filter.py
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ train_by_seq.py
โ”‚   โ”‚   โ””โ”€โ”€ train_by_seq_aug.py
โ”‚   โ”œโ”€โ”€ viz/
โ”‚   โ”‚   โ”œโ”€โ”€ merge_aug_origin_npy.py
โ”‚   โ”‚   โ”œโ”€โ”€ viz_confusion_top3.py
โ”‚   โ”‚   โ””โ”€โ”€ viz_history.py
โ”‚   โ””โ”€โ”€ webcam/
โ”‚       โ”œโ”€โ”€ webcam_test.py               # For data collection
โ”‚       โ”œโ”€โ”€ realtime_infer_test.py       # Lightweight prediction only
โ”‚       โ””โ”€โ”€ sign2text_gui.py             # PyQt5-based GUI app (Main App)
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐ŸŽฎ GUI App (PyQt5)

โ–ถ๏ธ Run the Real-time GUI App

python src/webcam/sign2text_gui.py
  • Webcam preview + Korean font rendering
  • Left panel: video feed
  • Right panel:
    • Status (๋Œ€๊ธฐ ์ค‘ / ์ˆ˜์ง‘ ์ค‘)
    • Top-3 predictions (with confidence)
    • Result display (์‹ ๋ขฐ๋„ ๋ถ€์กฑ if below threshold)
  • Buttons:
    • ์ˆ˜์ง‘ ์‹œ์ž‘: toggles sample collection

๐Ÿ”ฅ Temperature scaling and confidence thresholding included
๐Ÿ” Sequence is auto-cleared after prediction

๐Ÿง  Real-Time Inference Demo: "Treatment" (์น˜๋ฃŒ)

์น˜๋ฃŒ ์˜ˆ์‹œ

The following GIF demonstrates a successful real-time recognition of the sign language gesture for "treatment" (์น˜๋ฃŒ) using our desktop application built with PyQt5 and MediaPipe.


๐Ÿง  Model Architecture

  • Input Shape: (sequence_length, 114)
    • 84 keypoint features (21 points ร— 2 hands)
    • 30 joint angles
  • Layers:
    • Conv1D โ†’ BatchNorm โ†’ Dropout
    • Conv1D โ†’ BatchNorm โ†’ Dropout
    • BiLSTM โ†’ Dropout
    • Dense โ†’ Dropout โ†’ Softmax

โš™๏ธ Sequence Configuration

SEQ_NAME = "L20"
  • SEQ_NAME defines the window size
  • Supported values: "L10", "L20", etc.
  • Make sure the model and X_mean.npy, X_std.npy in models/L## match this name


๐Ÿ“‚ Learning Data Composition

The Sign2Text project was trained on a curated dataset containing 61 sign language labels, constructed from both the original AI Hub data and newly augmented samples. The distribution of labels is as follows:

Category Label Count Description
๐Ÿ“ฆ Original only 31 labels Labels that exist only in the original dataset (.npy without augmentation)
๐Ÿ” Common (Shared) 19 labels Labels included in both the original and augmented data
โž• Augmented only 11 labels Newly added labels from webcam-based augmentation

โœ… Label Lists by Type

๐Ÿ“ฆ Original-only (31)

๋ฐฅ์†ฅ, ์ถœ๊ทผ, ํ‡ด์‚ฌ, ํฌ์ผ“, ์—ฌ์•„, ํ•™์—…, ์—ฌํ•™๊ต, ๋ฐฑ์ˆ˜, ์ฑ„ํŒ…, ์‹ ํ•™,
๋‰ด์งˆ๋žœ๋“œ, ๋‚จ์•„, ๋…์„œ์‹ค, ์œ ํ•™, ๊ตญ์–ดํ•™, ๋‹ค๊ณผ, ์˜ํ•™, ์œ„์Šคํ‚ค, ์šธ์‚ฐ, ๊ตฌ์ง,
ํ•™๊ต์—ฐํ˜, ๋ฌธํ•™, ์˜ˆ์Šต, ์‚ฌ์ง, ์นœ์•„๋“ค, ๋ฒŒ๊ฟ€, ๋ฐฐ๋“œ๋ฏผํ„ด, ๋ฒ„์Šค๊ฐ’, ์‹๋‹น, ์›”์„ธ

๐Ÿ” Common (Original + Augmented, 19)

๊ฐ๊ธฐ, ๊ฐœํ•™, ๊ฒฝ์ฐฐ์„œ, ๋…์„œ, ๋…์ผ์–ด, ๋ผ๋ฉด, ๋ณ‘๋ฌธ์•ˆ, ๋ณด๊ฑด์†Œ, ์ˆ˜๋ฉด์ œ, ์ˆ ,
์Šฌํ”„๋‹ค, ์‹ซ์–ดํ•˜๋‹ค, ์ปคํ”ผ, ์ฝœ๋ผ, ํ‡ด์›, ์น˜๋ฃŒ, ํ•™๊ต, ์ž…์›, ์›”์„ธ

โž• Augmented-only (11)

๊ฟ€, ๋‚˜(1์ธ์นญ), ๋„ˆ(2์ธ์นญ), ๋”ธ, ์•„๋“ค, ์•ˆ๋…•ํ•˜์„ธ์š”, ์˜์–ด, ์šด๋™, ์ž…์‚ฌ, ์ข‹๋‹ค


๐Ÿ–ผ๏ธ Visual Summary

Image

A Venn diagram showing the overlap between original and augmented label sets.

โœ… Actually Recognized Labels in Real-Time Inference (30)

๐Ÿ” Common (19)

๊ฐ๊ธฐ, ๊ฐœํ•™, ๊ฒฝ์ฐฐ์„œ, ๋…์„œ, ๋…์ผ์–ด, ๋ผ๋ฉด, ๋ณ‘๋ฌธ์•ˆ, ๋ณด๊ฑด์†Œ, ์ˆ˜๋ฉด์ œ, ์ˆ ,
์Šฌํ”„๋‹ค, ์‹ซ์–ดํ•˜๋‹ค, ์ปคํ”ผ, ์ฝœ๋ผ, ํ‡ด์›, ์น˜๋ฃŒ, ํ•™๊ต, ์ž…์›, ์›”์„ธ

โž• Augmented-only (11)

๊ฟ€, ๋‚˜(1์ธ์นญ), ๋„ˆ(2์ธ์นญ), ๋”ธ, ์•„๋“ค, ์•ˆ๋…•ํ•˜์„ธ์š”, ์˜์–ด, ์šด๋™, ์ž…์‚ฌ, ์ข‹๋‹ค


๐Ÿง  Key Findings

  • Only 30 augmented labels (common + augmented-only) were recognized reliably in real-time testing.
  • Original-only labels were not recognized, even if included during training.
  • This suggests that data recency and augmentation quality have a stronger impact on performance than just label presence.
  • Labels with freshly collected webcam samples showed significantly higher prediction confidence.

๐Ÿฅ• Data Augmentation Workflow

  1. Run:
python src/webcam/webcam_test.py
  1. Press s โ†’ show gesture
  2. Press w โ†’ save raw_seq_*.npy and norm_seq_*.npy
  3. Data saved at: dataset/augmented_samples/<label>/

๐Ÿงช Train with Augmented Data

python src/train/train_by_seq_aug.py
  • Merges raw + augmented samples
  • Saves model and normalization stats to models/L##/

To train without augmentation:

python src/train/train_by_seq.py

๐Ÿ“ˆ Evaluate on Hold-out Set

python src/hold_out_test/holdout_test.py
  • Loads samples from videos/, uses test_labels.csv
  • Outputs to holdout_results.csv
  • Visualize results with:
python src/viz/viz_confusion_top3.py

๐Ÿ” Label Similarity Analysis

python src/predict/label_similarity_filter.py
  • Computes cosine similarity between mean label vectors
  • Useful to identify confusing signs
  • Input: merged dataset with angles

๐Ÿ“Š Visualization Tools

  • viz_history.py: plot training history
  • viz_confusion_top3.py: visualize confusion matrix
  • merge_aug_origin_npy.py: merge original/augmented samples for comparison

๐Ÿ“ Notes

  • On macOS, use cv2.VideoCapture(1) if 0 doesn't work
  • Use Korean font: AppleGothic or malgun.ttf for readable text
  • Recommended: collect 30+ samples per label for robust accuracy

๐Ÿ“Ž License

This project is part of the Open Source Programming course at Sookmyung Women's University.
It uses MediaPipe and TensorFlow under the Apache 2.0 License.

About

2025 OpenSource Programming final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages