How to Compare Annotation Files
Guide for LDC Annotators


**Note: File assignments for comparison are located in /mnt/talk/ACE/PHASE2/comparison_tool/assignments**

1) Make sure that any (LDC) documents to be compared are in APF format.  (This should include annotation files and source files.)
You can do this easily by running the sgm2apf script. The script can be found at /mnt/talk/ACE/PHASE2/awb-4.48/tools/scripts
To run the script,

sgm2apf -sgm old_file.sgml -apf new_file.apf.xml
2) Once all files to be compared have an APF version, you need to specify the list of files you want to compare. IMPORTANT NOTE: Be sure that there are no spaces or new lines after the file names in these files. If there are, the comparison tool will try to compare the file '' and will choke.

3) Run the comparison script and specify where to send the output

./edtscript.pl -r reference.lst -t test.lst -q source.lst -a > compare_results/REFERENCE_TEST_DATASET.out
You should name the output file so it's easy to figure out what it contains -- REFERENCE refers to the site whose annotation comprised the reference; TEST is the site whose annotation comprised the test; DATASET is the data being compared.  So a sample filename might be
LDC_BBN_TEST1.out
Also note that the comparison tool does not like outputting to a file which already exists. You must choose a new file when running the above command.

4) The .out file contains results from all the files which have been compared.  Any differences in annotation are marked with >>> in the .out file.
You should examine each difference, decide on one or the other, and write some commentary describing your decision.  (You should save your comments in an Emacs file).  Please include the context of the example in your commentary.  You may find it helpful to open up two versions of AWB, one with each of the files being compared, in order to understand the annotation differences.



strassel@ldc.upenn.edu
9/4/2001