FORM's primary goal is to assemble a large corpus of
kinematic gesture data integrated with speech transcription, syntactic transcription,
pitch contour, and spectrograms. Currently we have 41 minutes of gesture data fully
annotated and corrected. Once this corpus is built, we can apply standard stastical
NLP techniques to partition natural gesture space into similar clusters with
respect to concurrent speech phenonmena. FORM enocdes gestures using two
different annotation schemes. The first scheme, FORM1, annotates joint
angles of the arm over time as well as higher-level descriptions of physical
movement. See the annotation codebook (link to codebook) for a description of
FORM1. The second scheme, FORM2, simply keeps track of the hand position in
space around the body and then uses inverse kinematics
to compute the arm joint angles.
Visualization and Animation Tools
These tools will "play back" an
annotation. This will allow the annotator to better judge how well he/she has
captured the linguistic phenomenon in question. We are currently working on
an implementation using the Jack animation toolkit with the
EMOTE interface from Penn's
Human Modeling and Simulation lab.
Annotation Graph plug-in for Anvil
FORM uses Annotation Graphs (AGs) as a logical structure to
store its data. AGs are a genrealized format for storing annotation on one
or more timelines. An Anvil->AG plug-in will allow FORM data to be exported
in the standard AG data type. Other types of annotation such as word
transcription and part-of-speech tagging can then be added to our data since
they will share the same AG timeline. See: Steven Bird, Mark Liberman, "A Formal Framework for Linguistic Annotation"
We are currently developing a new verion of the FORM gesture
annotation specification. FORM2 demonstrated a significant improvement
over FORM1 in both accuracy and annotation speed by using inverse
kinematics to infer arm kinematics based on the location of the hand
relative to the body. FORM3 instead uses C.J. Taylor's VideoMoCap technology
to capture the kinematics of the body, including arms, head, torso, and legs.
Anotators using the FORM3 spec simply move frame-by-frame through the
video clicking on the subjects key joints, such as the shoulders and elbows.
VideoMoCap then calculates the 3D skeletal kinematics of the subject's
body movement using relative lengths of body segments. FORM's standard
annotation specifications for finer grained kinematics such as hand
and wrist movement are still included as part of FORM3.
FORM has started a pilot project with Mark Liberman to
explore the relationship between prosody and paralinguistic features of
speech such as gesture. We have collected new video data of natural
gesture in the context of a two-person conversation. This data will soon
be annotated using a reduced portion of the FORM tag set and analyzed against
pitch contour using the open-source speech analysis tool
Motion Capture comparison studies
There are 5.5 minutes of gesture data
recorded in the HMS lab with digital video as well as the "flock-of-birds"
magnetic motion capture system. Once the visualization tools are complete
we will be able to compare the output of the motion capture data with the
animations generated directly from FORM annotations. This study will give
us an indication of FORM's correctness.
New Metrics for Inner-Annotator Agreement
Our current Inter-Annotator
Agreement (IAA) numbers are based on the bag-of-arcs technique. However, as the
scores there indicate, often annotators agree to a large degree on structure,
but differ only on exact beginning or ending timestamp, or on the value of an
attribute. Unfortunately, small differences in timestamp and value are judged
incorrect to the same degree as large differences. Visual feedback, as just
described, will allow us to discover whether small differences in coding
actually have little difference visually. If this proves to be the case, then
we will need to experiment with more geometrically-based measures of
similarity, e.g., distance in n-dimensional space.
Augmented Search Algorithms for Annotation Graphs
community has already begun research into the most efficient ways to search AG
data. But, as we add richer information, we need to extend the search
capabilities to allow researchers fast access to this complex data. An example
would be the need to search for all gestures similar to the one given in our
example. Further, the researcher might want to then search those results for
gestures which accompany certain syntactic or intonational structures.