This package is video interpreter which can detect 59 distinct characters and symbols from American Sign Language.
Ex: An attempt to sign "Hi Artem"
Complete the following steps to install the package.
-
Download the package.
-
Install required python packages from the command line using:
pip install -r requirements.txt -
Install the package from the command line using:
pip install dist/American_Sign_Language_Reader-1.0-py3-none-any.whl
Select a video file that you would like to have interpreted. Navigate to the folder containing the file. Run the following command from the command line:
aslread <path to video>
Videos are processed in the following manner:
-
A video is first fed into the Mediapipe Hand Landmark Detector. The detector maps 21 distict landmarks points in the form of (x,y,z) coordinates onto each hand in a given frame. More information on the Mediapipe Hand Landmark detector can be found here.
Note: The z-coordinate (depth) is often inaccurate.
-
The landmark data is transfered into Tensorflow. The datapoints are rescaled, centered so that Hand Landmark datapoint 0 (datapoint on the wrist) is now at the origin, and rotated into a fixed position using an orthogonal transformation.
-
The processed datapoints are fed into a Tensoflow CNN model trained on data from the American Sign Language competition on Kaggle. The model makes frame-by-frame predictions in the form of probability distributions representing the likelihood of each of the 59 characters to represent the sign in a given frame e.g. P(sign in frame i = 'u') = 0.5, P(sign in frame i = 'v') = 0.3, ...
-
A phrase is assigned to the entire video. This is done in the folling manner: we consider all labelings of the frames with the 59 distinct characters and assign a score to each labeling based on the aggragate sum of the model predicting the given character for a given frame with a slight weighting bonus for consecutive frames predicting the same character. The best score is converted to a phrase for the video.
-
An annotated video is the created with the following information:
- Annotated original video with hand landmarks added
- Normalized hand landmarks
- Letter predictions
- Phrase predictions
The Tensorflow CNN model was trained on data from the Kaggle American Sign Language competition. Information can be found here.
I would like to thank my mentor Artem Yankov for guiding me through this project.
