HARD2004 Clarification Form Submission Instructions and 2003 Feedback
Last updated July 15, 2004
Sites participating in the HARD 2004 evaluation may opt to submit clarification forms to LDC assessors in order to garner additional feedback from topic creators.
LDC is designing a user interface that will randomize the order of the forms for each topic. Every time an assessor accesses the interface, the order of sites' submissions is once again randomized. This method will greatly reduce the chances of assessors "learning" sites' clarification form styles. A database will track the exact order in which each form was viewed and answered. (Last year's approach was much less careful; annotators viewed sites' submissions in the same sequential order each time: UMAS1, UMAS2, ILUC1, ILUC2, etc., for each topic.)
Required information
You must include this cgi script in your form:
- form action="https://secure.ldc.upenn.edu/cgi-bin/Projects/HARD/cf_form_2004.cgi" method="post"
This indicates the script where the output will be generated. You are welcome to use this cgi URL during development of your form, since all it does is output the selected information.
- input type="hidden" name="site" value="XXXXn"
Here, "XXXX" is a 4-letter code designating your site and "n" is a run number. The site codes are below. The run numbers should reflect your priority order. That is, XXXX1 will be processed then XXXX2 and so on. If you only have one set of forms, please use 1. For example, the first submission from UMass would be UMAS1.
- input type="hidden" name="topicid" value="000"
Indicates the topic number. It should be 3-digit code with zeros padding as needed. So 001 rather than 01 or 1.
- input type="submit" name="send" value="submit"
This is the submit button that should appear somewhere on your page.
In addition, please include somewhere on the page the topic number (e.g., "001") and the title of the topic. The purpose of including this is to provide a sanity check that the annotators are, indeed, answering the correct questions.
Note that the annotators will have the topic description, narrative, and their metadata values available to them for reference. You do not need to put any of that information on the clarification form (except the title and topic number, as just mentioned).
Put the forms in separate files, one per topic, with the filename XXXXn_000.html where, again XXXX is your site code, n is the run number, and 000 is the topic number.
Instructions for submitting clarification forms
LDC guarantees that two clarification forms from each site will be filled out. If participation and time permits, the LDC will fill out additional forms from each site. The naming convention on forms makes it clear the order in which you want the forms to be filled out (e.g., XXXX1 then XXXX2 then -- if time permits -- XXXX3 and so on).
Create a tar file that contains all of the forms for a run and optionally gzip the file. The file should be named XXXXn.tar where XXXX is your site code and n is the run number. The file should contain exactly 50 files named XXXXn_000.tar.
Sites are now free to use the following process to upload clarification forms
Upload your site's forms using "old-fashioned" ftp:
- Start the ftp session from a command line:
ftp ftp.ldc.upenn.edu
- login using "anonymous" as user name
Type your email address as password
- type "bin" to set the transfer mode to BINARY
- use "put" or "send" to transfer your local file to the ftp server, _AND_ make sure to specify the full path on the server where the file should go, as follows:
put your_file.name /pub/ldc/csr_incoming/your_file.name
- To exit, type
bye
NOTE: After you have uploaded the file, you will not be able to see that it is there. The incoming directory is a write-only directory that does not allow you to read its contents. Do not be surprised that it is not readable.
When done, you must send an email message to Meghan Glenn
(mlglenn@ldc.upenn.edu) and Stephanie Strassel (strassel@ldc.upenn.edu) so that LDC knows that the transfer has happened.
If you have any problems (e.g., because of firewall restrictions on your site), please check with your local networking support people, if possible, and contact the LDC only if you unable to resolve the problem locally.
Assessor feedback from HARD 2003
Positive feedback:
- Extra space: annotators preferred to have a text box in which to include additional relevant terms.
- Variety of choice: it was helpful to have both potentially relevant lists of terms and short text segments to choose from.
- Headlines: it was better to a headline/title and then a list of terms, rather than just a list of terms.
- "Tell us more": annotators enjoyed having a space in which to specify more of what they were looking for.
- Longer text/word selections: it was best to have the full word and/or phrase in order to inform a more secure selection.
- Color and design: it was more pleasant to fill out several dozen forms if the design and color were varied a little bit.
- Ranking: it was helpful to rank sources according to preference [if two (or three) sources are equally desired, can they receive the same rank?]
- Timeframe: some annotators enjoyed choosing a timeframe for the question.
- Negative option: In marking single terms or short phrases, Neg. (as in,the annotator is not at all interested in a certain result) is a good option to have, in addition to Yes/No.
- Clarity: Clear (and brief) instructions were best. For example, "select the terms that are relevant to your query." No need to add extra detail (like "don't select the terms that are not relevant").
- Run variation: annotators enjoyed reading differently styled forms from the same site.
Negative feedback:
- Scrolling boxes. They were time-consuming and cumbersome, often containing too much information to judge in each box. Perhaps in the future, if sites choose to use scrolling boxes, they could be a bit lighter in content.
- Partial words as terms.
- Large groups of terms that are judged together: more often than not, they are not all on topic.
Recommendations for the future:
- Check the text of the forms for spelling and syntactical errors. Some of the instructions were vague and difficult to understand because of the wording or other mistakes. Brevity and clarity are probably best for an exercise like this.
- Check the hidden input field in each individual form to ensure that it matches the intended site, run and topic numbers. We experienced a number of confusing moments because the topic clarification form was explicitly named one thing, while the result file (i.e., hidden input value) was actually named something else.
- When testing the forms before submitting them to the LDC, *especially during or immediately after the LDC's completion of the forms*, be sure to do one of these things:
- Send an email warning us of the fact that your site is testing X number of forms
- Change the hidden input value of the form to something that is obviously not a HARD topic number (i.e., XXX1 or TEST1). Some sites did do this and it was very helpful.
- Change the cgi script to something other than the one the LDC provided (probably not the most effective option)
It was quite confusing to differentiate "test" results from actual results. Again, some sites did chose to rename their test forms, which was extremely kind.
All in all, this was an entertaining and enjoyable task. It showed the annotators a little bit of personality from each of the sites; most of the staff found it interesting to see how creatively the sites interpreted the clarification form guidelines. We hope this will remain a part of the HARD track in the future.
Back to Main HARD Project page