Thursday, June 24, 2010

3D Avatar - animated model

Final Year Project Report - Chapter 10

10. Virtual 3D hand Modeling

10.1 3D Avatar– the Signing Human
To facilitate the communication from normal person to the Deaf/hearing impaired 3D avatar is used. It facilitates the vision based interpreting of sign language to the Deaf/hearing impaired. It also helps as a rapid [1] way of learning sign language and verbal language.
10.2              Bones and Joints of a Human Hand
The skeleton of a real hand is shown in Figure 10.1. There are total of 27 boned which are joined together and form the hand. The 21 Degree of Freedom [DOF] of the bone joints of the hand are controlled by 29 mussels [2] that enable us to grasp and hold objects properly.
A description...
Figure 10.1 Bones of the right hand (dorsal view) [4]

The constrains of the DOF points are utmost important , because it forms the dynamic behavior of the hand. As shown in Table 10.1, most joints have native angles of rotation.
Joint
Range X
Range Y
Range Z
CMC 1
-180 ~ 20
-180~ 0
-90 ~ 120
CMC 4
-10 ~ 0
0
0~5
CMC 5
-20 ~ 0
0
0 ~ 10
MCP 1
-90 ~ 0
0
0
MCP 2-5
-90 ~ 30
0
-20 ~ 20
PIP 2-5
-120 ~ 0
0
0
DIP 2-5
-80~0
0
0
IP(thumb)
-90 ~ 20
0
0
Table 10.1 DOF of the Joints in Degree (°) [3].
The human hands are complex articulated structures with multiple degrees of freedom. This makes the modeling and animation of high quality flexible virtual hands extremely difficult especially for real-time interactive applications. On this project we employ virtual hands for real-time Sign Language visualization for which they are of the utmost importance [5]. Figure 10.2 shows the skeleton of the 3D avatar used on this project which contains 16 bones including palm and 2 bones for arms with total of 18 bones.


C:\Users\chand\Desktop\HandSketch.jpg
Figure 10.2 Skeleton of the Project’s 3D Model
10.3              Modeling and Skinning
The 3D Object was modeled using 3D Studio Max (3dsMax9) as shown in Figure 10.3. It is easy to use, innovative and professional software for the modeling of 3-Dimensional characters [6] which provides a variety of virtual characters to be easily modeled by manipulating parameters through its graphical interface. One of the goals of modeling is to develop an anatomically correct model that has only the necessary number of vertexes and is optimized for animation.

D:\Education\Final Year Project\Jpgs\Desktop\3.JPG
Figure 10.3 Modeling on 3DsMax9
When modeling the 3dMax Objects, it is very important to name different parts of the models with relative names, so that it can be uniquely partitioned and identified on java. Figure 10.4 listed names does help to uniquely identify different parts of the full model on java 3D.
D:\Education\Final Year Project\Jpgs\Desktop\5.JPG
Figure 10.4 Named 3D objects on 3dsMax

After modeling, the mesh was exported as a wave front <.obj> file object format.
10.4. Java 3D
Java 3D is a scene graph-based 3D application programming interface (API) for the Java platform. It runs on top of either OpenGL or Direct3D. Since version 1.2, Java 3D has been developed under the Java Community Process [7] and freely available Open Source API
10.4.1. Java3D Initiations for Animation
There are few essential coding that required for java3D initiating with.
·       Simple Universe
private SimpleUniverse universe;
private GraphicsConfiguration config;
config = SimpleUniverse.getPreferredConfiguration();
·       Creating Java3D canvas
private Canvas3D canvas;
canvas = new Canvas3D(config);
·       Bounding Sphere
private Bounds influenceRegion;
·       Camera View Point
ViewingPlatform viewPlatform = universe.getViewingPlatform();
·       Setting Background
TextureLoader textureLoader =
new TextureLoader(bgPic, canvas);
Background background = new Background(textureLoader.getImage());
background.setApplicationBounds(influenceRegion);
·       Lighting
Color3f lightColor = new Color3f(Color.WHITE);
Vector3f lightDirection = new Vector3f(0.0f, 20f, -25f);
DirectionalLight light = new
DirectionalLight(lightColor,lightDirection);
light.setInfluencingBounds(influenceRegion);
10.4.2. Adding 3D Model to Canvas
The Wave Front ‘.obj’ file is loaded into java scene and Material for the object is added.3D model is scaled as needed.
TransformGroup model = getHuman(scene, influenceRegion);
Transform3D scaleModel = new Transform3D();
scaleModel.setScale(.8);

Color brown = new Color(255, 128, 64);
Appearance brownAppearance = getAppearance(brown);
namemap.get("p11").setAppearance(brownAppearance);

Separate parts of the 3D model are referenced in java as follows:
arm2t = new TransformGroup();
arm2t.addChild(namemap.get("arm2"));
10.4.3. Setting basic transformations to particles

Every particle of the loaded model need to place and rotate to make a meaningful model.Java3D provides X, Y, Z axis for making required adjustments as shown on Figure 10.5.

C:\Users\chand\Desktop\axis.jpg Figure 10.5 Java3d axis
One of the most difficult and essential part of the work is to transform and align the model particles on relevant reference places. There are two main points to set correctly for proper operation of the model.
·       Pivot point
o      Point of a particle where its rotation takes place around
·       Particle location
o      Position of the particle relates to its parent particle
If a particle misaligned, then the model will have deformations on when it’s real-time animating as shown in Figure 10.6.The extra sphere on the figure shows the current pivot point of the ‘arm1’ particle. To move the pivot point the elbow joint and place the particle on correct point use a Transform3Dobject and apply the matrix transformation.
C:\Users\chand\Desktop\pivotpointBad.jpg
Figure 10.6 Misplaced pivot point of ‘arm1’
modifyPartPivot(arm2x, arm2t, arm2, new Vector3f(0.218f, 0.347f, 0.340f), new Vector3f(-0.241f, -0.326f, -0.333f));
The 1st Vector3f provides the pivot point adjustment and the other with particle position adjustment.
Separated parts are joined in Java, using as parent and child nodes of TransformGroupObject type.
modifyPart(arm1, palm);

Finally all the particles are added to the root object, which is then rendered (compiled) and added to the Universe. Figure 10.7 shows the 3D model on Java3D jFrame.
root.addChild(background);
root.addChild(light);
human.addChild(body);
root.compile();
universe.addBranchGraph(root);

Figure 10.7 3D Avatar Model on Java3D
Finally, a set of public methods (Figure 10.8) are written to support external integration of the module with other modules. Rotation angle of each 18 joints on x, y, z, axis are set from external parameters at real-time.
temp.rotX((float) Math.toRadians((rotation - jointRotX)));
temp.rotY((float) Math.toRadians((rotation - jointRotY)));
temp.rotZ((float) Math.toRadians((rotation - jointRotZ)));
Figure 10.8 Public methods
Figure 10.9 through Figure 10.11 shows the rotation of hand on x, y, z, axis.
C:\Users\chand\Desktop\rotX.jpg
Figure 10.9 Rotation on X- axis
C:\Users\chand\Desktop\rotY.jpg
Figure 10.10 Rotation on Y- axis

C:\Users\chand\Desktop\rotZ.jpg
Figure 10.11 Rotation on Z- axis
Figure 10.12 shows a combinational transformation of 3 axis rotation.
C:\Users\chand\Desktop\rotXYZ.jpg
Figure 10.12 Combinational rotations on 3 axes
The values of the above rotation angles are saved at configuration/training of the system and saved at the database connected to the system. A simplified diagram of the overall system is shown on Figure 10.13.
Figure 10.13 System Diagram
References
[1]                       Science News ;Animated 3-D Boosts Deaf Education;”Andy” The Avatar Interprets By Signing
<http://www.sciencedaily.com/releases/2001/03/010307071110.htm>
[2]                       Book: Human hand function
Lynette A. Jones, Susan J. Lederman
Oxford University Press US, 2006
[3]                       High Quality Flexible H-Anim Hands for Sign Language Visualization Desmond E. van Wyk, James Connan Department of Computer Science University of the Western Cape, Private Bag X17 Bellville, 7535, South Africa
[4]  The Visual Dictionary. Volume 3: Human body, Bones of The Hand (dorsal view), http://www.infovisual.info/03/027 en.html.
[5]  High Quality Flexible H-Anim Hands for Sign
Language Visualization
Desmond E. van Wyk, James Connan
Department of Computer Science
University of the Western Cape, Private Bag X17 Bellville, 7535, South Africa
[6]                       http://en.wikipedia.org/wiki/3D_studio_max
[7]                       http://www.stmuc.com/moray/


Abbreviations
Degree of Freedom [DOF]
Application Programming interface [API]
Department of Electrical and Information Engineering

ALLTALK WIRELESS SIGN LANGUAGE INTERPRETER

Annual Technical Conference 2006 of IET-YMS Sri Lanka, 16th October 2010
Department of Electrical and Information Engineering, University of Ruhuna

ALLTALK WIRELESS SIGN LANGUAGE INTERPRETER
Edirisinghe E.A.M.M, Gamage K.G.C.P, Kumara M.P.H.N, Rathnayake R.M.M.H
Supervised by: Dr. N.D Jayasundere
Department of Electrical and Information Engineering - University of Ruhuna - Sri Lanka

The Knowledge Network                                                                The Institution of Engineering and Technology is registered as a charity organization

4



Abstract
For many deaf people, sign language is the principle means of communication. This increases the isolation of hearing impaired people. This paper presents a system prototype that is able to automatically recognize sign language to help normal people to communicate more effectively with the hearing or speech impaired people. The Sign to Voice system prototype was developed using Feed Forward Neural Network and the gesture input device was a designed sensor glove. This system recognizes both static and dynamic gestures of American Sign Language including the alphabet and a subset of its words, phrases and sentences. Recognized gestures are translated into voice in real time and the voice input is modeled using a 3D model for the recognition of the signer.

I.                  Introduction
This system is a computerized sign language recognition system for the vocally disabled  who use sign language for communication. The basic concept involves the use of specially designed sensor gloves connected to a computer while a disabled person (who is wearing the gloves) makes the signs. The computer analyzes these gestures and synthesizes the sound for the corresponding word or letter for normal people to understand and in the other hand to translate the voice of the talking person and to model it in a 3D model as a visual representation for the signer.
In this project only single handed gestures have been considered. Hence it is obviously necessary to select a subset of American Sign Language (ASL) to be considered for implementation of AllTalk Wireless Sign Language Interpreter.
Traditionally, the technology of gesture recognition was divided into two categories, vision -based and glove-based methods. In vision-based methods, computer camera is the input device for observing the information of hands or fingers. [1] and [2] presents such vision based systems. In glove based systems data gloves are used which can archive the accurate positions of hand gestures as its positions are directly measured. Such glove based systems are presented in [3], [4] and [5].
However using data gloves has become a better approach than camera as the user has the flexibility of moving the hand around freely, unlike the camera where the user has to stay in position before the camera. Light, electric or magnetic fields or any other disturbance does not affect the performance of the glove. Hence the data that is generated is accurate but has the disadvantage in that clothing can be cumbersome and intrusive. Thus we have turned to glove-based technique which is more practical in gesture recognition with a design of a sensor glove for gesture input without moving into data gloves currently available in the market.

II.               System Architecture
The designed system input hand gestures to the system through an electronic sensor glove and it identifies the gesture patterns via neural network. Then the identified sign is converted to text and then translated to voice output. Alltalk interface is also provided to communicate vice versa in which it converts voice into text and then into 3d model in real time as shown in Figure 1.

Figure 1: System Architecture
The basic components of the AllTalk Wireless Sign Language Interpreter are given below:
i.             Modules for Gesture Input – Get state of hand (position of fingers, orientation of hand from glove and other sensors and convey to the main software.
ii.           Data Transmission Module – Transmit all the data received from all sensors to the gesture processing module.
iii.        Gesture Preprocessing Module – Convert raw input into a process-able format for use in pattern matching. In this case, scaled integer values ranging from 0 to 255.
iv.         Gesture Recognition Engine – Examines the input gestures for match with a known gesture in the gesture database.
v.           Text to speech converter module – Examines the recognized gestures and converts them into texts as well as to speech.
vi.         Speech to text converter module – Recognize speech and convert them into texts.
vii.      

Aviator module - Aviator is a separate module which can recognize human speech and convert them to sign language.


III.             Gesture Input
Position and orientation of hand is obtained by two main parts; data glove and sensor arm cover. Data glove consists of 5 flex resistors and one accelerometer as shown in Figure 2 and the arm cover consists of 3 accelerometers and one bend sensor. Tilt of the palm can be captured by the accelerometer and the bend of the five fingers can be measured by flex sensors.

untitled.bmp
Flex sensors
Accelerometer










Figure 2: Orientation and Position of sensors
Sensors are interfaced to 18f452 programmable microcontroller which is programmed by Micro C. A/D module, Capture module, I2C module and Uart module of the microcontroller are programmed in the IC. There the structure of data packets was created which consist of sensor values which are to be served to Bluetooth module.
IV.             Data Transmission Module
All the sensor readings are converted to digital format within the microcontroller and reformat the digitalized data to transmit over the virtual serial port using UART protocol. All the analog outputs from sensors are given a value in between 0 and 1024 after the analog to digital conversion. Then the particular values are rescale to a value in between 0 and 255 (8-bit) .So each character(8-bit value) return from each sensor buffers at the microcontroller and all those 8 bit values send one by one in packet format via the UART protocol [6].
All sensors values were converted to digital format within the microcontroller and were sent as data packets to the PC via Bluetooth [7].
V.               Neural Network Approach
Neural Network is used to distinctly identify different signs. It was used in both static and dynamic gesture recognition. The design of the gesture recognition module combines the output of multiple feed forward neural networks called experts to produce a classification. Each network is trained for a particular sign to give positive response for that sign and a negative one for all the others.
Then a voting mechanism was implemented to take the output of all the experts as its input and to identify the resultant gesture by examining the outputs of all the experts and selecting the one with a positive result. Each expert was trained more than using 2500 samples. Java Neuroph [8] was used for the implementation of the gesture recognition module.

VI.             Text to Speech Converter Module
Identified gesture from the neural network is converted into text. Then the text is converted to speech with an open source speech synthesis engine called FreeTTS [9].

VII.          Speech to Text Converter Module
Speech to text is used at end user who can talk and listen. When he speaks it is converted to text. Then converted text identifies whether it is in database or not. If it is in database then related visual with respect to spoken text will be shown. Speech is recognized and converted into text by using java sphinx [10].

VIII.        3D Model
The 3D model was created using 3DsMax9. Java3D is used to draw the model object on the application. Model is loaded on to java3D and every part is mapped into different 'TransformGroup's. All TransformGroup's are modified so that pivot points are set to required locations using 'Transform3D'.
 A separate class was added to map real time input data with rotational parameters from data base tables. Motion constrains are set at database, so to be able to fully customize the motions.

IX.             Conclusion
Deaf and Dumb people rely on sign language interpreters for communication. However they cannot depend on interpreters in everyday life mainly due to the high costs and the difficulty in finding and scheduling qualified interpreters. This system will help them in improving their quality of life significantly.
X.               References
[1]              Lee, J. & Kunii, T. L. Model-based analysis of hand posture IEEE Journal, 1995, 15, 77-86
[2]              Isaac Garcia Incertis, Jaime G?mez Garc?a-Bermejo, Eduardo Zalama Casanova, "Hand Gesture Recognition for Deaf People Interfacing," Pattern Recognition, IEEE International Conference on, vol. 2, pp. 100-103
[3]              Fels, S. S. & Hinton, G. E. Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls, IEEE Journal,1998, 9, 205-212
[4]              T. Takahashi and F. Kishino. Hand gesture coding based on experiments using a hand gesture interface device. SIGCHI Bulletin, 23(2):67–73, 1991.
[5]              http://www.slideshare.net/asharahmed/boltay-haath# Boltay haath
[6] MikroC user manual
[7] http://www.bluetomorrow.com/
[8http://neuroph.sourceforge.net/documentation.html
[9]http://freetts.sourceforge.net/docs
[10]http://cmusphinx.sourceforge.net/sphinx4/#what_is_sphinx4