TDT3
First Story Detection

Quick Guide              Detailed Instructions


Quick Guide

Setup

Stage 1: Develop a topic Stage 2: Search for more on-topic material Stage 3: Identify earliest on-topic story

Detailed Instructions

The First Story Detection interface runs in Netscape, so you need to have Netscape running before you begin.   Then open a morph window.  To do this, pull down the mouse tools menu on the left mouse button, go to 'hosts' and select 'morph'.  (Your searches will be completed much more quickly in morph than in unagi.)  In the morph window, type the following command:

/mnt/talk/TDT3/ts/bin/fs3

After a moment, the FSD interface will open. The interface contains several Netscape windows and an Emacs window. For now, you'll need to focus on the green "Untitled" window:
 

There are 120 seed stories for the First Story annotation task, divided into 10 sets of 12 stories each.  View your assigned seed story set by clicking on the down arrow in the window labeled "SEEDS":


 

The first line of text in each of your seed stories is displayed in the interface window.  To see a story in its entirety, click on the DOCNO for that story.  When you do this, the text of story will be displayed in another Netscape window.  The first time you work with a seedlist, be sure the story displayed matches the DOCID.  [If they do not match, use the arrow keys to scroll up and down the seed list, then return to the first seed story in the list.]

To begin searching for the first story related to your topic, you must first select it as a new topic. To do this, click on the "NEW TOPIC" icon.  You must do this to continue searching for the first story.

When you click on "New Topic", you'll see that the yellow topic window assigns a new topic number to your topic. This window also contains the text of the seed story.  As you annotate additional stories against this topic, their DOCIDs will be added to the list as well.

Scrolling down in this window, you'll see that you can enter a topic title and brief topic description for your new topic.   You must hit the "Submit Change" icon for your topic title and description to be recorded.

Once you've entered a title for your potential topic, click on the  next to "Topics" in the interface window to refresh the topic list and add your topic to the list.  This pull-down list of topics keeps track of  topic ID number, number of on-topic stories, topic title, and annotator.

Now you can begin to search for additional stories on this topic. First, you need to conduct a search for stories similar to the seed story. You do this by selecting the seed story in the article list, and conducting an article search. To do an article search, make sure your search window is configured like this:

Hit submit, and the search engine will return a list of stories similar to the seed story. This list is rank-ordered, meaning that the stories most similar to the seed story appear at the top of the list. The seed story itself is at the very top of the list. Read through the stories, starting at the top of the list, until you've found one that is indeed on the same topic as the seed story's topic.  (If you find no additional on topic stories, then you'll need to do a keyword search.  If a keyword search also reveals no additional on topic material, then your work with this topic is done:  the seed story is the first and only story on this topic.)

When you find an additional related story, mark it as on-topic. To do this, hit the "Return" key on your keyboard while that story is selected. The status of the story in the "Topic" and "OT" columns will change to indicate that is is now on topic. If you read a story and decide it's not on topic, do nothing, and the "OT" status will display "n" for no.  (If a story contains a brief mention of the topic, you can hit the "b" key on your keyboard.  If the story is not a news story at all and should be rejected, type "r".  You can hit "Return" again to change a story's status from yes to no and vice versa.)  All stories you haven't read will be indicated with a dash in the "OT" column.  The "OR" column displays the numerical order in which the stories appear.  This helps you keep track of how many stories you've read.


 

Once you've identified 4 or 5 additional on-topic stories for this topic, you can begin to refine your search. This time, search for stories similar to ALL the on-topic stories that you've already identified. First, refresh the list of on-topic stories for the current topic by clicking the  button next to "Articles". Then submit a search against all of these on-topic stories by setting the search parameters to the following. Make sure you're searching "on topic".  You'll need to specify how many stories the interface returns to you.  You can use the pulldown menu, or type in a number.  For large topics, you'll want to set the number higher than for relatively small topics.  Use your judgement, but a threshhold of 50 is probably about right for most cases.  Once you've set up the search parameters, hit "Submit" to execute the search.

This search will return a list of likely stories, arranged by rank, with the most likely stories at the top of the list. Scan through this list of likely stories to identify any on-topic material.  Remember that you can identify a story as on-topic by hitting RETURN on your keyboard.  Once you've identified some additional on topic materials, order these stories by date. To do this, click on the top of the "Date" column in the interface window. This will re-order your annotated stories chronologically.  If you want to see them arranged by reverse date order, click the "Date" column again. This sorting allows you to find the earliest story that you've already identified as being on-topic:


 

Now, you must conduct one last search for on-topic stories that might have occurred prior to this earliest story.  This final search is not as formulaic as the earlier searches and requires you to excercise some judgement.  Start by once again refreshing the on topic story list by clicking the  button next to "Articles". Then you'll need to set some parameters for your final search.

First, you'll need to set the date range for your search to look only for stories occuring prior to the earliest on topic story you've identified so far.  You'll also need to specify the number of return stories. This number should be based on the total number of hits you've already found for the topic.  You've already scanned through some number of stories - how many of those were hits?  You should ask the search to return an appropriate number based on the on-topic/off-topic ratio you've seen so far.  If you had to scan through 50 stories to finda couple of hits, you can be pretty certain you've already found most of the on-topic material.  On the other hand, if you scanned through 50 stories and they were all hits, you'll have to scan through a larger number of date-sorted stories to be sure you've found the earliest one.

Set the date and number of returns by using the pull-down options and/or typing in the appropriate date and number of returns in the search line, like this: .
 

Finally, you'll need to decide what you're searching against.  You can search against ALL the on-topic stories you've identified so far by choosing an on-topic search, like this: .

Instead, you might prefer to search against only a subset of on-topic stories.  To do this, click the box in the "CS" column next to each article you want to search against:

Then set the search type to "on checked":  . (To clear the checked articles, you can hit the "C" button on the search line.)  Then hit "Submit" to execute the search.

However you decided to set up your final search, the end result will be the same: a rank-ordered list of potentially on-topic stories that occur before the date you've specified.  Scan through this final list of stories for any remaining on-topic stories.  Once you've checked through this list, you can be satisfied that you've identified the first on-topic story.  Be sure to update your topic description at this point if necessary.  At this point, you can move on to the next topic, or quit by hitting the "Quit" button.  **Do not quit the interface by killing Netscape or deleting windows!  You must use the Quit button!**

**NOTE: It is acceptable, and even desirable, for you to use your knowledge about a topic to refine your searches for the first story.  If you know that the seminal event for a topic occurred on a particular day, you can focus your attention on news reports from that time period.  If you have other ideas about how to identify first stories that aren't covered here, please let us know!**
 


strassel@ldc.upenn.edu
9/7/99