Gesture Corpus
FORM's primary goal is to assemble a large corpus of kinematic gesture data integrated with speech transcription, syntactic transcription, pitch contour, and spectrograms. Currently we have 41 minutes of gesture data fully annotated and corrected. Once this corpus is built, we can apply standard stastical NLP techniques to partition natural gesture space into similar clusters with respect to concurrent speech phenonmena. FORM enocdes gestures using two different annotation schemes. The first scheme, FORM1, annotates joint angles of the arm over time as well as higher-level descriptions of physical movement. See the annotation codebook (link to codebook) for a description of FORM1. The second scheme, FORM2, simply keeps track of the hand position in space around the body and then uses inverse kinematics to compute the arm joint angles.

Visualization and Animation Tools
These tools will "play back" an annotation. This will allow the annotator to better judge how well he/she has captured the linguistic phenomenon in question. We are currently working on an implementation using the Jack animation toolkit with the EMOTE interface from Penn's Human Modeling and Simulation lab.

Annotation Graph plug-in for Anvil
FORM uses Annotation Graphs (AGs) as a logical structure to store its data. AGs are a genrealized format for storing annotation on one or more timelines. An Anvil->AG plug-in will allow FORM data to be exported in the standard AG data type. Other types of annotation such as word transcription and part-of-speech tagging can then be added to our data since they will share the same AG timeline. See: Steven Bird, Mark Liberman, "A Formal Framework for Linguistic Annotation"

FORM3
We are currently developing a new verion of the FORM gesture annotation specification. FORM2 demonstrated a significant improvement over FORM1 in both accuracy and annotation speed by using inverse kinematics to infer arm kinematics based on the location of the hand relative to the body. FORM3 instead uses C.J. Taylor's VideoMoCap technology to capture the kinematics of the body, including arms, head, torso, and legs. Anotators using the FORM3 spec simply move frame-by-frame through the video clicking on the subjects key joints, such as the shoulders and elbows. VideoMoCap then calculates the 3D skeletal kinematics of the subject's body movement using relative lengths of body segments. FORM's standard annotation specifications for finer grained kinematics such as hand and wrist movement are still included as part of FORM3.

Prosody research
FORM has started a pilot project with Mark Liberman to explore the relationship between prosody and paralinguistic features of speech such as gesture. We have collected new video data of natural gesture in the context of a two-person conversation. This data will soon be annotated using a reduced portion of the FORM tag set and analyzed against pitch contour using the open-source speech analysis tool Wavesurfer.

Motion Capture comparison studies
There are 5.5 minutes of gesture data recorded in the HMS lab with digital video as well as the "flock-of-birds" magnetic motion capture system. Once the visualization tools are complete we will be able to compare the output of the motion capture data with the animations generated directly from FORM annotations. This study will give us an indication of FORM's correctness.

New Metrics for Inner-Annotator Agreement
Our current Inter-Annotator Agreement (IAA) numbers are based on the bag-of-arcs technique. However, as the scores there indicate, often annotators agree to a large degree on structure, but differ only on exact beginning or ending timestamp, or on the value of an attribute. Unfortunately, small differences in timestamp and value are judged incorrect to the same degree as large differences. Visual feedback, as just described, will allow us to discover whether small differences in coding actually have little difference visually. If this proves to be the case, then we will need to experiment with more geometrically-based measures of similarity, e.g., distance in n-dimensional space.

Augmented Search Algorithms for Annotation Graphs
The annotation-graph community has already begun research into the most efficient ways to search AG data. But, as we add richer information, we need to extend the search capabilities to allow researchers fast access to this complex data. An example would be the need to search for all gestures similar to the one given in our example. Further, the researcher might want to then search those results for gestures which accompany certain syntactic or intonational structures.


© 1996-2003 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.
Contact ldc@ldc.upenn.edu