| Project Information | Tools | Work Assignments | Progress |
| Background | Typing in Arabic | ||
| Task 1 | Arabic keymap | ||
| Task 2 | Topic description pages | ||
| Task 3 | Using the search engine | ||
| Timeline | Using the story assessor |
Our collection of news comes from Agence France Press, which has news bureaus in many Arabic-speaking nations. The news coverage will be local, regional and international. The news collection consists of 383,872 stories from May 1994 through December 2000 - 2,337 days worth of stories in all.
Your job will be to identify topics that
are discussed in this news collection. For this project, a topic
is general -- not a specific event (like a particular plane crash or particular
political event), but rather a broad category, like "energy resources"
or "computer hacking". Later, you'll be asked to review sets of stories
that automatic processes have determined to be "on-topic" to determine
whether they are indeed related to the topics you've identified.
TASK
1
Come up with a list of 50 potential topics
that you believe are likely to be discussed in the news collection described
above. Each
topic should consist of a one-sentence
description (in Arabic) of the information you are seeking. Please
word this description as a
question. Some examples of topics
you might come up with:
-Have any computer hackers been charged with crimes in Jordan?
-What measures are being taken to relieve air pollution in Cairo?
Use your knowledge of the collection's
time period (1994-2000) and the regional focus (Jordan, Lebanon, Egypt,
Syria, etc.) to help you
formulate potential topics. Avoid
creating topics which are punctual; that is, topics that relate to an event
that occured at a specific place
and time.
You might find it helpful to consult resources
like encyclopedias (which publish annual yearbooks covering major events
and issues in a given
year) to help you think of likely topics.
Also, please consult the list I've given you containing the 79 topics that
were developed for last
year's project (although these are Chinese
news topics, some of them might be adaptable for this year's Arabic language
focus. In any case,
this document will give you a sense of
what an appropriate topic looks like.)
TASK
2
Once each of you has developed a list
of 50 potential topics, you will begin to search through the news collection
to identify stories
that discuss your topics. More details
on the exact nature of topic searching will be discussed the next time
we meet, but here is an
overview:
*Using a customized search engine, you'll
submit a query consisting of a few keywords drawn from your topic description.
*The search engine will then display a
list of 25 documents that are likely to be related to your topic.
*You will read each of these 25 documents,
and determine which ones are indeed related to your topic.
*Depending on the number of documents
related to your topic, you'll either
-discard the topic (too few or too many documents)
OR
-execute another search for this topic, and read the top 100 stories returned
by the search engine
At this point, your work on this topic is complete and you'll move on to the next one. Throughout this process, you'll keep careful notes about the search terms you've used, how you are defining the topic, and so on. When we have identified & defined 25 topics that are suitable (neither too big nor too small), our work on the first phase is done.
TASK
3
You will be asked to review a set of stories
for each of the 25 topics. For each story, you'll decide whether
it is related to the topic or not. More details to come...
| Task | Description | Deadline |
| 1 | Compose preliminary topic lists | May 9, 2001 |
| 2 | Create topic description pages for each topic | June 1, 2001 |
| 3 | Identify 25 good topics from preliminary lists | June 1, 2001 |
| 4 | Assess relevance of documents | September 28, 2001 |
How
to Type in Arabic
We are using a version of the text editor
emacs to create our Arabic language documents. The editor is called
armule (ar for Arabic). To open a file using this text editor, type
armule filenameInitially, the file is set up to enter Roman characters (to type in English, for example). If you want to switch to typing in Arabic, type
Meta-\ (Hold down meta key while typing backslash)The Meta key is next to the spacebar on your keyboard; it has a diamond shape on it. The Arabic keymap below shows you which Arabic characters correspond to which Roman characters. To switch back into Roman characters, type Meta-/ again.
You must be on a blank line without any text entered before switching back and forth between Arabic-English! Do not try to switch mid-sentence.
When typing in English, everything works as you might expect. Lines wrap automatically, and to start a new line you hit Return. When typing in Arabic, things are a little different. Here are some pointers:
Arabic Keymap
- Don't hit "RETURN" to start a new line! Instead, use one of these two methods:
- Hit the space bar then type Cntrl-j (hold down control key while typing j)
- Hit Cntrl-e (hold down control key while typing e) to get to the absolute right margin of the line, and then hit <return>
- To undo what you've just done, type Cntrl-SHIFT-_ (hold down control key and shift key while typing dash)
- To delete the Arabic character preceding the cursor (to the left of the cursor), type Cntrl-d (hold down control key while typing d)
Creating Topic Description Pages
Each topic requires the creation of a topic description page, containing the following information:
Topic NumberYou will need to enter this information for each topic that you create. Additional information fields will be added as you work with each topic further.
Topic Title: Short phrase
Topic Author: Who created this topic
Description (Arabic & English): Brief, one-sentence description of the information you are seeking (in the form of a question).
Keywords (Arabic & English): Set of keywords you use to retrieve relevant documents.
To create the topic description pages, follow these steps:
1) Go to the Arabic directory by typing
cd trec-cl-ara/topics2) Find the next available topic in your topic series
Fatima : Topics 100-150 (ar-100 through ar-150)You can find the initial topic lists that you created by following these links. Please begin with the topics in blue.
Gordon: Topics 200-250 (ar-200 through ar-250)
Diana: Topics 300-350 (ar-300 through ar-350)
Nabih: Topics 400-450 (ar-400 through ar-450)
Fatima's preliminary topics3) Open your topic by typing armule followed by the file number, e.g.,
Gordon's preliminary topics
Diana's preliminary topics
Nabih's preliminary topics
armule ar-1004) A file will open which contains a blank set of fields. Your job is to enter information (in either English or Arabic) in each of these fields. Click here to see a sample topic description document (with nonsense Arabic language).
You'll use the arrow keys or the mouse to scroll from one field to another, but be sure to remember the tips about typing in Arabic when entering the data!
5) Once you have filled in each field, save your work by
pulling down File: Save Buffer6) Close the topic window by
pulling down File: Exit Emacs