Annotation Guide
TDT2000
| Annotation Strategy | Using the Interface | English Work Assignments | Mandarin Work Assignments | Progress |
Part 1: The Annotation Strategy
TDT2000 involves 60 topics and 3 months' worth of data (October through
December of 1998) from the following sources:
| English | Mandarin | |
| Newswire | NYT
APW |
Zaobao
Xinhua |
| Radio | PRI The World
VOA English |
VOA Mandarin |
| Television | CNN Headline News
ABC World News Tonight MSNBC News With Brian Williams NBC Nightly News |
Your job is to search through the 3 months' worth of news to find stories that are related to the topic. You will have several resources at your disposal:
In addition, for some topics, you will also have
- the EZQuery Search Engine, which you will use to scan through the corpus
- a seed story (in either English or Chinese), which contains a reference to the topic's seminal event
- a topic explication, which describes the who/what/when/where of the topic and provides a brief summary
- a topic research document, which gives more detail on the topic
Using these tools, you will search for stories in the corpus that discuss your topic. When you find a likely story, you will read it and label it in the following way:
- a list of 1 or more additional on-topic stories
YES: this story discusses the topic in a substantial wayYour search for on-topic stories will be done in a number of stages, outlined below:
BRIEF: this story makes passing reference to the topic (less than 10% of the story discusses the topic). NOTE: There is no such thing as too brief a reference. Any mention is worth noting with BRIEF.
NO: this story does not discuss the topic at all.
STAGE 1: Initial Query
-submit all known on-topic stories as a query
[If there are no known on-topic
stories in your language for the topic, then skip to Stage 3]
-search engine returns relevance-ranked list of stories
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list
until
you have found 5 additional on-topic stories
OR UNTIL
you have read at least 2 off-topic stories for every on-topic story found
and
the last 10 stories were off-topic
STAGE 2: Improved Query Based on Additional On-Topic Stories
-issue a new query using a concatenation of all known on-topic stories
-search engine returns relevance-ranked list of news stories (excluding
those already annotated)
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list
until you have read at least 2 off-topic stories for every on-topic story
found
and the last 10 stories were off-topic
STAGE 3: Text-based Query
-issue a new query using the TOPIC RESEARCH DOCUMENT PLUS ANY ADDITIONAL
SEARCH TERMS APPROPRIATE (e.g., parts of the topic explication)
-search engine returns relevance-ranked list of news stories not already
seen
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list
until you have read at least 2 off-topic stories for every on-topic story
found
and the last 10 stories were off-topic
STAGE 4: Creative Searching
You are encouraged to use your specialized knowledge (drawn from topic research and the known on-topic stories) to conduct additional manual searches through the corpus. These additional searches will be based on keywords, names, particular on-topic stories, etc. Think creatively! If you come up with a novel way to search for additional on-topic stories, let us know.
If you find additional information (names, places, dates, events) about your topic, you should revise the topic research page for that topic. Then re-submit the topic research page as a text query to find additional on-topic stories.
Part
2: Using the Annotation Interface
((This section is under construction))
Getting Started
Start Netscape.
In an xterm, type do2000.
This window will open, containing the list of files that have been assigned to you.
Click on one of the file names (topic numbers) and choose a language.
Click
to begin
annotation.
Before you begin your search for additional on-topic stories, you must
Once you've done this, you're ready to begin annotation!
- read all known on-topic stories
- to display a story, click the DOCID and hit the "View" button in the interface
- read the topic definition
- when you start up the annotation interface, the topic definition will open in a Netscape browser window
- read the topic research
- follow the link on the topic explication page to see the topic research
- for topics without known on-topic stories in a particular language, be sure to read the seed story summary on the topic definition page
Step-by-step guide to annotation
Following the annotation strategy outlined above, follow these steps IN ORDER.
STAGE 1: Expanding the Query
-submit all known on-topic stories as a query
When you click "Execute search on docnos", the search engine will scan
through the corpus for related stories. When the search engine
has completed its search, the interface will prepare a judgement file
containing a relevance-ranked list of stories for you to annotate.
When you click on Annotate, the main annotation window will disappear.
In its place will be the judgement file and an article window:
-reads and annotates (yes/no/brief) ALL stories in relevance-ranked list until
you have found A=5 additional on-topic stories
OR UNTIL
you have read at least B=2 off-topic stories for every on-topic story found
and
the last C=10 stories were off-topic
STAGE 2: Find On-topic Stories based on Improved Query
-issue a new query using a concatenation of selected on-topic stories
-search engine returns relevance-ranked list of news stories not already
seen (excluding those already tagged either on-topic OR
off-topic)
-annotator reads and annotates (yes/no/brief) ALL stories in relevance-ranked
list until
you have read at least B=2 off-topic stories for every on-topic story found
and
the last C=10 stories were off-topic
STAGE 3: Find Stories that are on-topic but unlike those already seen.
-issue a new query using the TOPIC RESEARCH DOCUMENT PLUS ANY ADDITIONAL
SEARCH TERMS APPROPRIATE
-search engine returns relevance-ranked list of news stories not already
seen
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list
until
you have read at least B=2 off-topic stories for everyon-topic story found and the last C=10 stories were off-topic
STAGE 4: Creative Searching
You are encouraged to use your specialized knowledge (drawn from topic
research and the known on-topic stories) to
conduct additional manual searches
through the corpus. These additional searches will be based on keywords,
names, particular on-topic stories, etc. Think creatively!
If you come up with a novel way to search for additional on-topic stories,
let
us know.
If you find additional information (names, places, dates, events) about your topic, you should revise the topic research page for that topic. Then re-submit the topic research page as a text query to find additional on-topic stories.