Annotation Guide
TDT2000


Annotation Strategy Using the Interface English Work Assignments Mandarin Work Assignments Progress

Part 1: The Annotation Strategy

TDT2000 involves 60 topics and 3 months' worth of data (October through December of 1998) from the following sources:
 
 

  English Mandarin
Newswire NYT
APW
Zaobao 
Xinhua
Radio PRI The World
VOA English
VOA Mandarin
Television CNN Headline News
ABC World News Tonight 
MSNBC News With Brian Williams
NBC Nightly News
 

Your job is to search through the 3 months' worth of news to find stories that are related to the topic.  You will have several resources at your disposal:

In addition, for some topics, you will also have
Using these tools, you will search for stories in the corpus that discuss your topic.  When you find a likely story, you will read it and label it in the following way:
YES: this story discusses the topic in a substantial way
BRIEF: this story makes passing reference to the topic (less than 10% of the story discusses the topic).  NOTE: There is no such thing as too brief a reference. Any mention is worth noting with BRIEF.
NO: this story does not discuss the topic at all.
Your search for on-topic stories will be done in a number of stages, outlined below:

STAGE 1: Initial Query

-submit all known on-topic stories as a query
    [If there are no known on-topic stories in your language for the topic, then skip to Stage 3]
-search engine returns relevance-ranked list of stories
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list until

              you have found 5 additional on-topic stories
OR UNTIL
              you have read at least 2 off-topic stories for every on-topic story found and the last 10 stories were off-topic
 

STAGE 2: Improved Query Based on Additional On-Topic Stories

-issue a new query using a concatenation of all known on-topic stories
-search engine returns relevance-ranked list of news stories (excluding those already annotated)
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list until you have read at least 2 off-topic stories for every on-topic story found and the last 10 stories were off-topic
 

STAGE 3: Text-based Query

-issue a new query using the TOPIC RESEARCH DOCUMENT PLUS ANY ADDITIONAL SEARCH TERMS APPROPRIATE (e.g., parts of the topic explication)
-search engine returns relevance-ranked list of news stories not already seen
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list until you have read at least 2 off-topic stories for every on-topic story found and the last 10 stories were off-topic

STAGE 4: Creative Searching

You are encouraged to use your specialized knowledge (drawn from topic research and the known on-topic stories) to conduct additional manual searches through the corpus.  These additional searches will be based on keywords, names, particular on-topic stories, etc.   Think creatively!  If you come up with a novel way to search for additional on-topic stories, let us know.

If you find additional information (names, places, dates, events) about your topic, you should revise the topic research page for that topic.   Then re-submit the topic research page as a text query to find additional on-topic stories.



Part 2: Using the Annotation Interface
((This section is under construction))

Getting Started

Start Netscape.

In an xterm, type do2000.

This window will open, containing the list of files that have been assigned to you.

Click on one of the file names (topic numbers) and choose a language.

Click  to begin annotation.
 
 
 
 
 
 
 
 
 

Before you begin your search for additional on-topic stories, you must

Once you've done this, you're ready to begin annotation!

Step-by-step guide to annotation

Following the annotation strategy outlined above, follow these steps IN ORDER.

STAGE 1: Expanding the Query
-submit all known on-topic stories as a query
 
 

When you click "Execute search on docnos", the search engine will scan through the corpus for related stories.   When the search engine has completed its search, the interface will prepare a judgement file containing a relevance-ranked list of stories for you to annotate.
 
 
 
 

When you click on Annotate, the main annotation window will disappear.  In its place will be the judgement file and an article window:

 
 

-reads and annotates (yes/no/brief) ALL stories in relevance-ranked list until

              you have found A=5 additional on-topic stories
OR UNTIL
              you have read at least B=2 off-topic stories for every on-topic story found and the last C=10 stories were off-topic
 

STAGE 2: Find On-topic Stories based on Improved Query

-issue a new query using a concatenation of selected on-topic stories
-search engine returns relevance-ranked list of news stories not already seen (excluding those already tagged either on-topic OR
off-topic)
-annotator reads and annotates (yes/no/brief) ALL stories in relevance-ranked list until

                you have read at least B=2 off-topic stories for every on-topic story found and the last C=10 stories were off-topic
 

STAGE 3: Find Stories that are on-topic but unlike those already seen.

-issue a new query using the TOPIC RESEARCH DOCUMENT PLUS ANY ADDITIONAL SEARCH TERMS APPROPRIATE
-search engine returns relevance-ranked list of news stories not already seen
-read and annotate (yes/no/brief) ALL stories in relevance-ranked list until

               you have read at least B=2 off-topic stories for everyon-topic story found and the last C=10 stories were off-topic

STAGE 4: Creative Searching

You are encouraged to use your specialized knowledge (drawn from topic research and the known on-topic stories) to
conduct additional manual searches through the corpus.  These additional searches will be based on keywords, names, particular on-topic stories, etc.   Think creatively!  If you come up with a novel way to search for additional on-topic stories, let us know.

If you find additional information (names, places, dates, events) about your topic, you should revise the topic research page for that topic.   Then re-submit the topic research page as a text query to find additional on-topic stories.




strassel@ldc.upenn.edu
7/14/2000