High Accuracy Retrieval from Documents (HARD)
Story Assessment Task
1.0 Background & Task Overview
In July we delivered to NIST 50 topics drawn from the AQUAINT corpus plus the Congressional Record and Federal Register for 1999. These are more broadly-defined, theme-based topics, rather than event-based TDT-style topics. We created topic statement files containing a title, description, narrative, and series of metadata values for each topic. Research sites were given this data as training material in two batches. First they received only the topic title, description and narrative. Based on that material they developed systems to identify all on-topic stories in the corpus. Second, sites were given the full topic statements with the metadata. They revised their systems to evaluate their performance with the added information. Sites then submitted the list of on-topic stories (for both runs) to NIST for scoring. NIST compiled a list of the top n-stories per topic from each site. That compiled list is what we assess.
2.0 Our Task
We have a total of 41,700 documents to assess for this year's evaluation. Each story must be read in comparison with the topic statement and description, and given a label of YES, NO or M. Following the initial relevance assessment task, we will label relevant passages in documents for the topics whose metadata specifies anything other than "Document".2.1 Topic labeling
The first thing you must do before performing any sort of task involving your HARD topic is, read the topic description once again, paying close attention to the metadata fields. This topic description should be in front of you at all times during labeling process. We will borrow the TDT definitions of YES and NO for this task:
- YES: this story discusses the topic in a substantial way. Stories that you label as YES should give some information about the topic and should answer the topic query. Documents do not have to contain new information about the topic -- a story that summarizes a topic's history or gives a snippet of information that you've read about before still counts as a YES. Even if the document contains a relatively small amount of information about a topic, it should be considered a YES.
- NO: this story does not discuss the topic at all, mentions the topic in passing without giving any information about it, or fails to address the specific query or angle of the topic. If a document names a topic or makes reference to it but does not provide any information about it, then that document should be considered a NO.
- M: Due to the fact that the HARD topics consist not only of topic descriptions but also of metadata, we must determine whether or not the document being assessed satisfies that Metadata. (Click here to refresh your memory as to what that is.) Hence we have established the letter M as the in-between "Yes", for a document that is relevant to the basic topic description, but does not meet the demands of the metadata; in other words, it does not satisfy the Purpose, Genre, or the Familiarity items (the others are not document-level items).
An "M" situation might occur, for example, if the "Genre" portion of the topic metadata were "Administrative", indicating that the query issuer considers only government documents to be appropriate. In this case, a New York Times article that addresses the topic would receive an "M" judgment, because even though the content of the story is relevant, the type or genre of the story is not administrative. Or if the "Purpose" of the topic is "Details", and you find a document that is on-topic but only answers the query in the most general way, then that document would receive an "M" judgment.
NOTE: The Genre metadata values are source-based. This means that only CR and FR documents may fulfill the "Administrative" value. In future tracks, this interpretation will most certainly change to include source and content, which will make annotation more interesting but perhaps a little trickier.
Making the decisionThe decision between YES and NO stories is usually clear, but some cases will be tricky. For the purposes of HARD, a topic is defined based on a question that targets a theme or a trend. Therefore, documents may address the general subject without zeroing in on the proper area of that topic. For example, if a topic is "SARS" and the Description asks, "How has the SARS epidemic impacted tourism in effected areas?", then articles addressing the symptoms of SARS and the alleged source of the disease would NOT be on-topic.
If you're having trouble deciding between YES and NO, ask yourself whether you learned anything about the topic by reading the story and whether you can answer the topic query based on that information, no matter how small, and no matter if you've seen that same information before. If you learn something about the topic by reading the story, then it should count as YES.
Additionally, the decision between YES and M may be quite difficult. In these cases, ask yourself if the document strictly abides by the appropriate Metadata.
If you have a particularly hard time making a decision between YES and NO or YES and M, please indicate as much by checking the "Difficult decision" button. The documents labeled "difficult" will receive closer attention during quality control passes.
2.1.1 Document-Level Assessment
The first thing we will do is perform document-level assessment. This task entails reading documents and evaluating whether or not they are relevant to a topic and to the topic's metadata. The decision you make will be based on the parameters outlined above.2.1.2 A note on Document format
The corpus contains documents from the New York Times (NYT), Associated Press Worldwide (APW), Xinghua English (XIE), the Congressional Record, and the Federal Register. The latter two document sources may be quite difficult to annotate.2.2 Passage-Level Assessment
The content of CR and FR documents can range from brief announcements in the House of Representatives and the Senate to bill proposals, lobbying efforts for new legislation, arguments between Congress-people, and Senate hearings. Often a Congressperson will repeatedly propose a certain kind of legislation; in this case, you may read nearly identical proposals several times over. The CR tends to be more user-friendly than the FR, which will contain, for example, the official emissions standards document for four-passenger vehicles in the United States, or instructions on how to obtain certain rules and regulations.
Please pay close attention to documents of this kind, due to the fact that they are less predictable, and often less clear, than news documents.
Additionally, the corpus contains some documents (primarily from APW source) that are nearly identical, if not strictly identical, in content. These stories are released on the same day, sometimes only minutes apart, and MUST receive the same document-level judgment. Likewise, during passage-level assessment, make sure that you identify the same relevant passages for each identical story.
For the topics whose "Granularity" metadata fields specify anything other than "Document" -- that is, "Passage", "Phrase", "Sentence", or "Any" -- we will provide a passage-level relevance assessment. First, all topics will be judged at the document level. Then, taking all the documents that were determined both to be on-topic and to fulfill the metadata (i.e., all YESes), we shall proceed to single out the relevant passages, sentences and phrases.
2.2.1. Annotation boundaries
During annotation, you will probably find adjacent passages that are relevant to the topic for different reasons. While it is tempting to select each one separately because you might think of them as unique units of relevance, you should keep the entire set of passages together as one passage. This means that you feasibly could select an entire document as the relevant passage, were the whole story to be on-topic.
This same concept applies to the sentence- and phrase-level assessment task: You should select the entire relevant portion of the document in question. The metadata value represents what the annotator desired, but you are not forced to abide strictly by those parameters. For example, if the metadata were "Phrase", and you found a paragraph of highly relevant material, then you should select the entire segment. Likewise, if the granularity were "Passage", and you found an on-topic sentence or phrase, but not a block of text, you should only isolate only the portion that is relevant.
The metadata values target what the query issuer thought would be appropriate; as you label each document, you are demonstrating which granularity value is appropriate.
Definition of terms:
Passage - a section of medium length; a block of text that includes more than one sentence.
Sentence - grammatical unit that is syntactically independent and has a subject that is expressed or, as in imperative sentences, understood and a predicate that contains at least one finite verb.
Phrase - A brief expression, sometimes a single word, but usually two or more words forming an expression by themselves, or being a portion of a sentence.
Back to Main HARD Project page