Quick Hints
Steps to follow for Novelty Tagging
Novelty Tagging Commands
n:
This relevant sentence is new (new)
o:
This relevant sentence is old (old)
q:
Save my work and quit Emacs (quit)
l:
See a list of files (list)
h:
See this list of commands (help)
Overview
Novelty tagging involves reading a set of articles that have been identified as related to a single topic, determining what information contained in those articles is relevant to the topic; and of that relevant information, what part of it is new information. The topics are numbered 1-100, and correspond to TDT2 English topics. The on-topic articles are presented to the annotator in chronological order. Starting with the earliest article, the annotator first identifies each sentence as belonging to one of the following categories:
Relevant: This sentence contains some information about the topicAll topic relevance judgements occur at the level of the sentence. If a sentence contains any information about the given topic, no matter how little, it should be considered relevant. Please refer back to the TDT2 Topic Definitions frequently to refresh your memory about what kind of information is relevant to the topic. In addition to judging relevance vs. irrelevance, the annotator should exclude sentences that are not news, such as chit-chat between reporters, reporter identifications and the like. (If the chit-chat makes a reference to the given topic, the sentence should be marked as relevant.)
Irrelevant: This sentence contains no information about the topic
Other: This sentence is not newsAfter all stories for a given topic have been read, and each sentence has been marked as relevant, irrelevant or other, the sentences selected as topic-relevant are then examined again. This time, the annotator is paying attention to the novelty of the information contained in the sentence: am I seeing any of this information for the first time among the articles I've just read? If so, the sentence is new. On the other hand, if the sentence contains only information that has already been discussed in previous on-topic stories (or sentences from the same story), then the sentence is old. If a sentence contains ANY new information about the topic, no matter how little, it should be considered new.
The annotator assigns the following tags to each topic-relevant sentence:
Using the Novelty Tagging Interface
The novelty tagging interface runs in Emacs. To access the interface, the annotator must first start up Netscape. Then the annotator types the following in an xterm window:
novtagThis instruction page will automatically appear in the Netscape window. In addition, as second Netscape window will open displaying the topic description.
In the xterm window, the annotator will see a list of commands:
l: Show file list. Topic (1000-1004) for testing.
q: Quit.
Enter your choice now:
Novelty tag command summary:At this stage, the annotator can type "l" to display a list of the available files. There are a total of 100 files. Each file contains a set of "on-topic" stories for a given topic. The number of stories varies from topic to topic. A file that appears with the label "AVAIL" is available for tagging. A file with the label "PROG" means that it is currently in progress, and the annotator working on that file is identified to the right of the file name. Files listed as "DONE" have been completed.Left, right, Up, Down arrows: move cursor to previous or next sentence.Enter: Mark the sentence to new.
n: Mark the sentence to new.
o: Mark the sentence to old.
i: Mark the sentence to irrelevant.
r: Mark the sentence to relevant.
t: Mark the sentence to other.
q: Save the file and Quit Emacs.
g: Quit Emacs without saving.
To select a file for annotation, the annotator simply types the topic number of an available file at the prompt (N.B. Type only the topic number, not the entire file name.):
Enter your choice now: 11The Emacs window that opens up is split into two sections. The bottom section displays all the articles for this topic, and the top window contains only document ID's, marked in blue text. In the article window, the individual stories are also separated by a line of blue text containing the document ID. To view an image of the pre-annotation Emacs window, click here. The annotator marks sentences as relevant, irrelevant or other in the following way:
1) use the up/down or left/right arrow keys to position the cursor somewhere within the current sentence.
2) hit one of the following keys to tag a sentence:
i: sentence contains only irrelevant information
t: sentence contains other, non-news information
After the annotator has tagged every sentence in all the articles either as relevant, irrelevant or other, they move on to the second stage of tagging: determining whether the relevant sentences contain new or redundant information.
The annotator now looks only at those sentences identified as relevant, which have been marked in red. For each sentence, the annotator judges whether the information contained in the sentence is new or redundant. Remember, these sentences are being compared to the information contained in all previous sentences. If a sentence presents any new information at all, tag it as new. If all the information contained in a sentence has been reported previously, tag it as old.
As a sentence is tagged as new, it is copied into the upper part of the Emacs window, under the appropriate story ID. This will help the annotator keep track of his/her decisions. This window contains a running list of all the sentences containing new information about a topic. To see an image of a post-annotation Emacs file, click here.n: this relevant sentence is new
o: this relevant sentence is old
After all sentences in a file have been marked as relevant/irrelevant/other, and then the relevant sentences marked as presenting either old or new information, the annotator should save her/his work and quit Emacs, by typing the following command:
The file will only receive the status "DONE" after all sentences have been tagged.q: Save work and quit Emacs
strassel@ldc.upenn.edu
Updated June 30, 1999