Novelty Tagging Instructions

Quick Hints

Steps to follow for Novelty Tagging
 

1) Open Netscape
2) In an xterm, type novtag 3) Select a file to work on by typing Topic ID number
4) In Emacs window, mark each sentence as relevant/irrelevant/other
5) In Emacs window, mark each relevant sentence (in red) as new/old
6) Save your work and quit


Novelty Tagging Commands



Overview

Novelty tagging involves reading a set of articles that have been identified as related to a single topic, determining what information contained in those articles is relevant to the topic; and of that relevant information, what part of it is new information. The topics are numbered 1-100, and correspond to TDT2 English topics. The on-topic articles are presented to the annotator in chronological order. Starting with the earliest article, the annotator first identifies each sentence as belonging to one of the following categories:

Relevant: This sentence contains some information about the topic
Irrelevant: This sentence contains no information about the topic
All topic relevance judgements occur at the level of the sentence. If a sentence contains any information about the given topic, no matter how little, it should be considered relevant. Please refer back to the TDT2 Topic Definitions frequently to refresh your memory about what kind of information is relevant to the topic. In addition to judging relevance vs. irrelevance, the annotator should exclude sentences that are not news, such as chit-chat between reporters, reporter identifications and the like. (If the chit-chat makes a reference to the given topic, the sentence should be marked as relevant.)
Other: This sentence is not news
After all stories for a given topic have been read, and each sentence has been marked as relevant, irrelevant or other, the sentences selected as topic-relevant are then examined again. This time, the annotator is paying attention to the novelty of the information contained in the sentence: am I seeing any of this information for the first time among the articles I've just read? If so, the sentence is new. On the other hand, if the sentence contains only information that has already been discussed in previous on-topic stories (or sentences from the same story), then the sentence is old. If a sentence contains ANY new information about the topic, no matter how little, it should be considered new.

The annotator assigns the following tags to each topic-relevant sentence:

The novelty tagging process is cumulative. This means that the annotator is comparing the information contained in the current sentence to all previously-read sentences on the given topic, including those earlier in the same story. However, when considering what information is new versus old, the annotator should be conscious of the difference between one's personal knowledge and what was discussed earlier in the stream of on-topic stories used for this exercise.  That is, a sentence is old only if it presents information you have already seen discussed earlier in the same file.  If a sentence presents information you knew from personal experience or from TDT annotation but which was not discussed earlier in the file, that sentence is new.


Using the Novelty Tagging Interface

The novelty tagging interface runs in Emacs. To access the interface, the annotator must first start up Netscape. Then the annotator types the following in an xterm window:

novtag
This instruction page will automatically appear in the Netscape window. In addition, as second Netscape window will open displaying the topic description.

In the xterm window, the annotator will see a list of commands:

h: Show tag command. Topic (1-100) for production.

l: Show file list. Topic (1000-1004) for testing.

q: Quit.

Enter your choice now:

By typing "h", the annotator can see the list of commands used in tagging:
Novelty tag command summary:
Left, right, Up, Down arrows: move cursor to previous or next sentence.

Enter: Mark the sentence to new.

n: Mark the sentence to new.

o: Mark the sentence to old.

i: Mark the sentence to irrelevant.

r: Mark the sentence to relevant.

t: Mark the sentence to other.

q: Save the file and Quit Emacs.

g: Quit Emacs without saving.
 

At this stage, the annotator can type "l" to display a list of the available files. There are a total of 100 files. Each file contains a set of "on-topic" stories for a given topic. The number of stories varies from topic to topic. A file that appears with the label "AVAIL" is available for tagging. A file with the label "PROG" means that it is currently in progress, and the annotator working on that file is identified to the right of the file name. Files listed as "DONE" have been completed.

To select a file for annotation, the annotator simply types the topic number of an available file at the prompt (N.B. Type only the topic number, not the entire file name.):

Enter your choice now: 11
The Emacs window that opens up is split into two sections.  The bottom section displays all the articles for this topic, and the top window contains only document ID's, marked in blue text. In the article window, the individual stories are also separated by a line of blue text containing the document ID. To view an image of the pre-annotation Emacs window, click here.  The annotator marks sentences as relevant, irrelevant or other in the following way:

1) use the up/down or left/right arrow keys to position the cursor somewhere within the current sentence.

2) hit one of the following keys to tag a sentence:

r: sentence contains some relevant information

i: sentence contains only irrelevant information

t: sentence contains other, non-news information

As the annotator works through the articles tagging sentences in this way, the color of the sentence being tagged will change to reflect the sentence's status with respect to relevant/irrelevant/other, as reflected in the list above.

After the annotator has tagged every sentence in all the articles either as relevant, irrelevant or other, they move on to the second stage of tagging: determining whether the relevant sentences contain new or redundant information.

 The annotator now looks only at those sentences identified as relevant, which have been marked in red. For each sentence, the annotator judges whether the information contained in the sentence is new or redundant. Remember, these sentences are being compared to the information contained in all previous sentences. If a sentence presents any new information at all, tag it as new. If all the information contained in a sentence has been reported previously, tag it as old.

n: this relevant sentence is new
o: this relevant sentence is old
As a sentence is tagged as new, it is copied into the upper part of the Emacs window, under the appropriate story ID. This will help the annotator keep track of his/her decisions. This window contains a running list of all the sentences containing new information about a topic.  To see an image of a post-annotation Emacs file, click here.

After all sentences in a file have been marked as relevant/irrelevant/other, and then the relevant sentences marked as presenting either old or new information, the annotator should save her/his work and quit Emacs, by typing the following command:

q: Save work and quit Emacs
 The file will only receive the status "DONE" after all sentences have been tagged.



 

strassel@ldc.upenn.edu
Updated June 30, 1999