1) Make sure that any (LDC) documents to be compared are in APF format.
(This should include annotation files and source files.)
You can do this easily by running the sgm2apf script. The script can
be found at /mnt/talk/ACE/PHASE2/awb-4.48/tools/scripts
To run the script,
sgm2apf -sgm old_file.sgml -apf new_file.apf.xml2) Once all files to be compared have an APF version, you need to specify the list of files you want to compare.
3) Run the comparison script and specify where to send the output
./edtscript.pl -r reference.lst -t test.lst -q source.lst -a > compare_results/REFERENCE_TEST_DATASET.outYou should name the output file so it's easy to figure out what it contains -- REFERENCE refers to the site whose annotation comprised the reference; TEST is the site whose annotation comprised the test; DATASET is the data being compared. So a sample filename might be
LDC_BBN_TEST1.outAlso note that the comparison tool does not like outputting to a file which already exists. You must choose a new file when running the above command.
4) The .out file contains results from all the files which have been
compared. Any differences in annotation are marked with >>> in the
.out file.
You should examine each difference, decide on one or the other, and
write some commentary describing your decision. (You should save
your comments in an Emacs file). Please include the context of the
example in your commentary. You may find it helpful to open up two
versions of AWB, one with each of the files being compared, in order to
understand the annotation differences.