Final Year Research Project

This IET paper was created for the IET-YP ATC in Sri lanka. This is the project report of University Final Year Project and its a group project including myself( any information below is subject to written permission from all of the group member(wont be hard :-))

Annual Technical Conference 2010 of IET-YMS Sri Lanka, 16th October 2010
Department of Electrical and Information Engineering, University of Ruhuna

Edirisinghe E.A.M.M, Gamage K.G.C.P, Kumara M.P.H.N, Rathnayake R.M.M.H
Supervised by: Dr. N.D Jayasundere
Department of Electrical and Information Engineering - University of Ruhuna - Sri Lanka

The Knowledge Network                                                                The Institution of Engineering and Technology is registered as a charity organization

For many deaf people, sign language is the principle means of communication. This increases the isolation of hearing impaired people. This paper presents a system prototype that is able to automatically recognize sign language to help normal people to communicate more effectively with the hearing or speech impaired people. The Sign to Voice system prototype was developed using Feed Forward Neural Network and the gesture input device was a designed sensor glove. This system recognizes both static and dynamic gestures of American Sign Language including the alphabet and a subset of its words, phrases and sentences. Recognized gestures are translated into voice in real time and the voice input is modeled using a 3D model for the recognition of the signer.

I.                  Introduction
This system is a computerized sign language recognition system for the vocally disabled  who use sign language for communication. The basic concept involves the use of specially designed sensor gloves connected to a computer while a disabled person (who is wearing the gloves) makes the signs. The computer analyzes these gestures and synthesizes the sound for the corresponding word or letter for normal people to understand and in the other hand to translate the voice of the talking person and to model it in a 3D model as a visual representation for the signer.
In this project only single handed gestures have been considered. Hence it is obviously necessary to select a subset of American Sign Language (ASL) to be considered for implementation of AllTalk Wireless Sign Language Interpreter.
Traditionally, the technology of gesture recognition was divided into two categories, vision -based and glove-based methods. In vision-based methods, computer camera is the input device for observing the information of hands or fingers. [1] and [2] presents such vision based systems. In glove based systems data gloves are used which can archive the accurate positions of hand gestures as its positions are directly measured. Such glove based systems are presented in [3], [4] and [5].
However using data gloves has become a better approach than camera as the user has the flexibility of moving the hand around freely, unlike the camera where the user has to stay in position before the camera. Light, electric or magnetic fields or any other disturbance does not affect the performance of the glove. Hence the data that is generated is accurate but has the disadvantage in that clothing can be cumbersome and intrusive. Thus we have turned to glove-based technique which is more practical in gesture recognition with a design of a sensor glove for gesture input without moving into data gloves currently available in the market.

II.               System Architecture
The designed system input hand gestures to the system through an electronic sensor glove and it identifies the gesture patterns via neural network. Then the identified sign is converted to text and then translated to voice output. Alltalk interface is also provided to communicate vice versa in which it converts voice into text and then into 3d model in real time as shown in Figure 1.

Figure 1: System Architecture
The basic components of the AllTalk Wireless Sign Language Interpreter are given below:
i.             Modules for Gesture Input – Get state of hand (position of fingers, orientation of hand from glove and other sensors and convey to the main software.
ii.           Data Transmission Module – Transmit all the data received from all sensors to the gesture processing module.
iii.        Gesture Preprocessing Module – Convert raw input into a process-able format for use in pattern matching. In this case, scaled integer values ranging from 0 to 255.
iv.         Gesture Recognition Engine – Examines the input gestures for match with a known gesture in the gesture database.
v.           Text to speech converter module – Examines the recognized gestures and converts them into texts as well as to speech.
vi.         Speech to text converter module – Recognize speech and convert them into texts.

Aviator module - Aviator is a separate module which can recognize human speech and convert them to sign language.

III.             Gesture Input
Position and orientation of hand is obtained by two main parts; data glove and sensor arm cover. Data glove consists of 5 flex resistors and one accelerometer as shown in Figure 2 and the arm cover consists of 3 accelerometers and one bend sensor. Tilt of the palm can be captured by the accelerometer and the bend of the five fingers can be measured by flex sensors.

Flex sensors
Figure 2: Orientation and Position of sensors
Sensors are interfaced to 18f452 programmable microcontroller which is programmed by Micro C. A/D module, Capture module, I2C module and Uart module of the microcontroller are programmed in the IC. There the structure of data packets was created which consist of sensor values which are to be served to Bluetooth module.
IV.             Data Transmission Module
All the sensor readings are converted to digital format within the microcontroller and reformat the digitalized data to transmit over the virtual serial port using UART protocol. All the analog outputs from sensors are given a value in between 0 and 1024 after the analog to digital conversion. Then the particular values are rescale to a value in between 0 and 255 (8-bit) .So each character(8-bit value) return from each sensor buffers at the microcontroller and all those 8 bit values send one by one in packet format via the UART protocol [6].
All sensors values were converted to digital format within the microcontroller and were sent as data packets to the PC via Bluetooth [7].
V.               Neural Network Approach
Neural Network is used to distinctly identify different signs. It was used in both static and dynamic gesture recognition. The design of the gesture recognition module combines the output of multiple feed forward neural networks called experts to produce a classification. Each network is trained for a particular sign to give positive response for that sign and a negative one for all the others.
Then a voting mechanism was implemented to take the output of all the experts as its input and to identify the resultant gesture by examining the outputs of all the experts and selecting the one with a positive result. Each expert was trained more than using 2500 samples. Java Neuroph [8] was used for the implementation of the gesture recognition module.

VI.             Text to Speech Converter Module
Identified gesture from the neural network is converted into text. Then the text is converted to speech with an open source speech synthesis engine called FreeTTS [9].

VII.          Speech to Text Converter Module
Speech to text is used at end user who can talk and listen. When he speaks it is converted to text. Then converted text identifies whether it is in database or not. If it is in database then related visual with respect to spoken text will be shown. Speech is recognized and converted into text by using java sphinx [10].

VIII.        3D Model
The 3D model was created using 3DsMax9. Java3D is used to draw the model object on the application. Model is loaded on to java3D and every part is mapped into different 'TransformGroup's. All TransformGroup's are modified so that pivot points are set to required locations using 'Transform3D'.
 A separate class was added to map real time input data with rotational parameters from data base tables. Motion constrains are set at database, so to be able to fully customize the motions.

IX.             Conclusion
Deaf and Dumb people rely on sign language interpreters for communication. However they cannot depend on interpreters in everyday life mainly due to the high costs and the difficulty in finding and scheduling qualified interpreters. This system will help them in improving their quality of life significantly.
X.               References
[1]              Lee, J. & Kunii, T. L. Model-based analysis of hand posture IEEE Journal, 1995, 15, 77-86
[2]              Isaac Garcia Incertis, Jaime G?mez Garc?a-Bermejo, Eduardo Zalama Casanova, "Hand Gesture Recognition for Deaf People Interfacing," Pattern Recognition, IEEE International Conference on, vol. 2, pp. 100-103
[3]              Fels, S. S. & Hinton, G. E. Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls, IEEE Journal,1998, 9, 205-212
[4]              T. Takahashi and F. Kishino. Hand gesture coding based on experiments using a hand gesture interface device. SIGCHI Bulletin, 23(2):67–73, 1991.
[5]     Boltay haath
[6] MikroC user manual