Abstract:
Recent reports show that there are over 1.5 billion people around the globe with hearing impairment, and here in Ethiopia, their number is estimated to be over 1.2 million. These people use Sign Language as a way of communication using manual and non-manual signs. However, Sign Language is only understood by the deaf community and some of their families. This creates a communication gap between them and the rest of the world. Although interpreters try to fill the gap, it is not enough compared to the communication demand. Hence, Automatic Sign Language Recognition (ASLR) is being studied for various Sign Languages in the world to fill the communication gap. ASLR methods involved techniques ranging from traditional machine learning to modern deep learning. Regarding Ethiopian Sign Language, few attempts were made to automate the recognition of Ethiopian Sign Language. However, they were found to be environment and signer dependents. These gaps hinder the journey to commercialize fully automated Sign Language Recognition products. Consequently, this study proposes an environment and signer invariant Sign Language Recognition model. The model first extracts skeletal key-points from the signer using MediaPipe, which is Google’s cross-platform pipeline framework that helps to detect and track human pose, face landmarks, and hands. After preprocessing the skeletal key-points information, feature extraction and learning are performed using deep learning architectures; Convolutional Neural Network followed by Long Short-Term Memory (CNN-LSTM), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and Gated Recurrent Units (GRU). In this study, the models were trained to classify twenty (20) isolated dynamic Ethiopian Sign Language signs. A total of 5600 video samples were collected from volunteer students at Haramaya University and used to train and test the deep learning-based models. First, all the models were trained and tested in a signer-dependent mode where GRU outperformed the other deep learning algorithms with 94% recognition accuracy. The outperforming GRU-based model was further tested in signer-independent mode and attained 73% recognition accuracy. The outcome of this study shows that Ethiopian Sign Language can be recognized in real-time within dynamic environments. It also implied that signer-independence can be achieved. This study attempted to take the signer-independence of ASLR models up to some level. However, further studies are required to recognize continuous signs in a fully open environment. Therefore, the technique implemented
xiv
to detect and track key-points in this study should further be investigated to recognize continuous EthSL.