Frequently Asked Questions
1. How do I get GALE data?
There are two types of GALE data: corpora created specifically for
the program, and previously-released corpora that have been designated
as GALE-relevant. New and planned GALE releases (kickoff and quarterly)
are described
on the GALE data matrix.
Authorized GALE sites will automatically receive copies of these
releases once their appointed data contact person has signed the GALE
user agreement and provided contact information.
Previously-released corpora that have been designated as GALE-relevant
are described on the GALE catalog
query page. Authorized GALE sites should follow
instructions on that page for requesting corpora.
2. How will data be
distributed?
GALE kickoff and quarterly releases will be distributed in one of two
formats. Text and annotations
will be distributed via web download. On the release date, each
site's data contact person will receive an email containing a URL where
each corpus can be downloaded. Audio
will be released on media. On the release date, LDC will
prepare a shipment containing CDs, DVDs and/or hard drives, depending
on the size of the corpus. Packages will be shipped via DHL or
FedEx. All authorized GALE sites will receive text and annotation
corpora. Given the high cost of creating and shipping hard drives,
audio data will not be distributed to sites who are not participating
in GALE evaluations.
3. How do I know what data
will be released in each quarter?
The GALE data matrix
reflects LDC's current plan for data to be distributed each
quarter. The matrix is updated on a regular basis to reflect
changes to the plan. As details of each release are finalized
that information is also added to the matrix, and can be viewed by
clicking on the name of the release under the Deliveries column.
(For instance, see details about the Kickoff1
Release).
4. My site has annotated some
LDC data and I want to share it with other sites on my team. How
can I do that?
GALE researchers can redistribute annotations in stand-off form without
involving LDC in any way. We understand that UIMA annotations are
stand-off in nature, that UIMA is the basis for integration of sites'
technology into the common platform, and that integration is supposed
to happen as the technology is developed; so this ought to mean there
are no impediments to sharing annotations within teams.
However we do recognize that some legacy annotations may exist as
inline rather than standoff form. Redistribution of copyrighted
data across sites is prohibited by GALE user agreements and by LDC's
agreements with our data providers.
To make it easier for sites to share this kind of data among their team
members we are creating a "transshipment point" at LDC. We have
set up a local LDC machine to act as an scp server. When a GALE
participant wants to distribute something to other sites on the team,
(s)he can upload the file(s) and inform other participants, including
responsible LDC parties, about what has been deposited and why.
The other parties then can download the deposit if they want to.
Each site will receive a user account per Resource Distribution Task Specification.
These accounts will allow password-less sftp or winscp access.
Each account will be a
member of one of three user groups (corresponding to the three
teams). Each file deposited by a site will inherit group
permissions for that site. Sites will have read/write access to
the data deposited by other members of their group. Sites will
not have read or write access to data deposited/owned by groups other
than their own.
We will make a general announcement to the GALE
Data Announcement List when this service is up and running (ETA:
early December 2005).














