Specify DiGIR FAQ
- What is DiGIR?
- Why use DiGIR?
- How does DiGIR work?
- What is the Darwin Core?
- How does one participate in DiGIR?
- What is the Specify DiGIR Package?
- What is installed and configured with the Specify DiGIR Package?
- What should be done before I install the Specify DiGIR Package?
- What information should I have before installing Specify DiGIR Package?
- What are some examples of DiGIR based communities?
- Where was DiGIR developed?
- Can I install the Specify DiGIR Package on Linux?
- Can I install the Specify DiGIR Package on a Mac?
- What version of the Darwin Core does the Specify DiGIR Package support?
DiGIR stands for Distributed Generic Information Retrieval. The DiGIR search and retrieval protocol links (or federates) a community of distributed databases in a way which brings their data together to create a unified or "virtual" database. The DiGIR protocol is implemented in software and it comes in two types: "provider" software works with collection database installations to make data records searchable and accessible on the network, and "portal" software components which typically work with web server installations and use web pages with query forms to issue search requests that retreive matching data from DiGIR providers.
The source for DiGIR protocol information and software is: http://digir.sourceforge.net/.
A DiGIR portal lets web browser users query multiple databases through a single web query form. A portal can broadcast queries to a known set of providers, and each provider translates the standard syntax of the search request into an equivalent request that is compatible with the structure of a local database.
Each taxonomic discipline shares common “concepts” which are described in a database management system as data within fields within records. Some examples of shared data concepts are: people, say a particular kind of person--a collector--who typically has a surname with initials and/or a forename. Other common data concepts deal with collection locality, such as a nearest named place, or a geographical feature, or a distance down a road. And of course collection databases all represent information about things--a lot of fish or a specimen of a slime mold. Despite having many data concepts in common, the internal database structures and syntax for storing data describing these concepts vary from one database to the next.
Without a standard protocol for exchanging information, the process of issuing a single query to multiple databases with different internal schema would be prohibitively expensive due to software complexity. First, one would need to know how to connect to each database. Then a query (e.g. for matching specimen records) would need to be defined and executed in the language, syntax and database structure of each particular database. And finally, the structure and syntax of the various result sets would then need to be translated into one format with the data for presentation back to the user. DiGIR protocol software solves these challenges by providing an efficient way to request and retrieve information from multiple, distributed databases, each with their own unique internal data structures or schema.
A DiGIR web portal has a list of network addresses for participating DiGIR providers. A user constructs a query on a portal web page (or perhaps through some other application) which then broadcasts it to one or more of the providers, as selected by the user. In the case of DiGIR and natural history collections, this query format or "query schema" is based on an information content standard called the "Darwin Core". The DiGIR protocol uses the Darwin Core standard to define the content or semantics of data fields and employs XML as the language for the exchange of queries and result sets between portals and providers. When a provider receives a DiGIR query, it translates the request from DiGIR XML format to the local query language of the target collection database and tells the database manager to run the query. The database returns any matching records to the DiGIR provider software which then translates the database response into the community standard Darwin Core XML format and returns the data to the requesting DiGIR portal.
In short, the Darwin Core is a standard set field definitions describing a natural history specimen. The source of information for Darwin Core is at: http://wiki.tdwg.org/twiki/bin/view/DarwinCore/WebHome. Natural history museums and herbaria house biological specimens collected across taxa, space and time. With each specimen, there is associated information that describes not only the organism but also the action of collecting it. For instance, specimen data typically include information about the collector of the specimen, where it was taken, the scientific name (at some taxonomic level) and the collection date. Historically, this information was only recorded in a collector's notebook and/or on a tag or label attached to the specimen. When access to collection information was all done manually through paper catalogs or by direct inspection of specimens, local differences among institutions in styles for description of data elements such as those mentioned above, were not an impediment. But as collections become computerized and accessible over the internet, those variations in data element definitions at a local level prevent efficient data access, integration and synthesis.
There is a commonality in the content of biological collection databases due to the conservative way in which specimens and their data are conceptualized by researchers in each natural history discipline. The Darwin Core defines some of the most important data concepts for specimens, and when implemented as a network standard, it allows researchers to retrieve data without knowledge of the underlying local database mechanisms used for describing and storing specimen information.
Typically, a DiGIR provider is associated with a particular institution and one or more of its collection databases. A DiGIR portal is an interface for accessing a community of data, examples include DiGIR portals sponsored by MaNIS, HerpNET, ORNIS, OBIS, and GBIF--see below). An institution wishing to serve data to a community portal, implements a DiGIR provider and registers that provider with one or more portals. An individual wishing to access and query a set of databases can use a local DiGIR portal, or one created by some other person, society, project or organization. The Specify Software Project provides an installation package and software to create a DiGIR provider and to associate it with a Specify database-and with a web server--both are required by the DiGIR provider in order to receive queries and serve data. The Specify DiGIR installer does not create a DiGIR portal. Each Specify DiGIR provider site needs to find an existing DiGIR portal with which to register, or must create its own DiGIR portal with software available elsewhere.
Prior to Specify 4.6, if an institution wished to provide data to a DiGIR community, it needed someone with the technical expertise to install and configure a DiGIR provider. The Specify DiGIR package can use a desktop PC running a Specify database, and automate the process of turning the machine into a web-accessible, DiGIR provider. It gives curators and collection staff an easy way to publish their data online as structured records using the DiGIR protocol. The Specify DiGIR installer is available from the Specify Project website and it is offered as a MS Windows installer, separate from Specify Software installer. The DiGIR package must be installed in conjunction with an existing Specify database system. If no other web installation is detected (IIS or Apache), the package also installs additional software components needed to run a DiGIR provider, including the Apache web server and the scripting language PHP.
The Specify DiGIR provider installation package is designed to operate on the Internet and it contains a fully functional Internet Web server. A web server is a network application with significant security issues and it requires more configuration and monitoring than an ordinary office desktop application. It is very important to discuss your plans for a DiGIR provider system with the IT staff within your organization. Your local network or system administrator would know best how to implement a DiGIR provider on a computer connected to your building or campus data network in order to minimize security risks.
The Specify DiGIR installer configures several programs. If no IIS or Apache installation is detected, the installer begins by installing the Apache web server and the PHP language. It then configures PHP to work with Apache. It also installs the core components for the DiGIR Provider (PHP code), and then configures the Apache web server and PHP to handle DiGIR related queries. If there is a web installation detected, then only the DiGIR provider code is installed. (New as of Specify 5.2) Please see the Specify 5.2 Installation and Setup Guide (PDF, 18.2 MB) for a step-by-step description of actual installation and setup procedures.
By default, the installer places Apache, PHP, the DiGIR provider scripts on the same machine as the one running Specify. Some organizations may prefer to separate Specify from the DiGIR provider for performance or security reasons. Consult your IT staff to determine which design is best (and allowable) for your network. In the guidelines below, it is assumed that all components of the DiGIR provider and Specify will reside on a single computer. If you want to try another installation configuration, we would be happy to advise on other installation options.
You will need to accomplish the following before installing the Specify DiGIR package:
- Install Specify—Specify is available from the Specify website.
- Install the latest patches and updates for your operating system from Microsoft. Very important.
- Be sure TCP/IP networking is installed and working properly. The DiGIR computer should have a fixed IP address and IP name registered with your organization’s DNS server.
- Be sure any firewalls restricting access to the DiGIR computer are configured to let HTTP traffic from the Internet reach the DiGIR system over the standard HTTP port (80).
The DiGIR installer program will ask for the following information. It is a good idea to have answers ready before you begin the installation. Consult with your IT staff if you have any questions about these settings:
- User Name and Company Name—The principal user for the system and the name of the company or organization. These fields are for informational purposes only. The default values are extracted from the local Windows configuration.
- DNS domain—The DNS domain for your DiGIR Apache web server (e.g., myserver.com).
- Full DNS name of web server—The fully qualified DNS name for the web server. If your name server is named myserver and it is in the domain mydomain.org, enter myserver.mydomain.org
- An e-mail address for the Apache administrator
- Institution name
- Institution code—an acronym or other shorthand identifier for your institution, should be unique for each collection
- A URL for a website with information on your institution
- Contact information for one or more persons within your institution who are familiar with the DiGIR implementation.
MaNIS – the Mammal Networked Information System, a DiGIR network of mammal collections
GBIF – the Global Biodiversity Information Facility, a DiGIR portal and cache of almost everything online
OBIS – the Ocean Biogeographic Information System, marine organisms
HerpNet – a DiGIR network of herpetological databases, things that slither
OrNIS – the Ornithological Information System, a network of flying dinosaur databases
DiGIR was originally developed by The University of Kansas Natural History Museum and Biodiversity Research Center, The California Academy of Sciences, and The Museum of Vertebrate Zoology in Berkeley. It is an open source collaborative initiative and its endeavors can be tracked at the DiGIR website.
The Specify DiGIR Package that was shipped with Specify 5.0, 5.1 and 5.11, supports Darwin Core version 1.20.
It includes the following concepts:
DateLastModified, InstitutionCode, CollectionCode, CatalogNumber, ScientificName, BasisOfRecord, Kingdom, Phylum, Class, Order, Family, Genus, Species, Subspecies, ScientificNameAuthor, IdentifiedBy, YearIdentified, MonthIdentified, DayIdentified, TypeStatus, CollectorNumber, FieldNumber, Collector, YearCollected, MonthCollected, DayCollected, JulianDay, TimeOfDay, ContinentOcean, Country, StateProvince, County, Locality, Longitude, Latitude, CoordinatePrecision, MinimumElevation, MaximumElevation, MinimumDepth, MaximumDepth, Sex, PreparationType, IndividualCount, PreviousCatalogNumber, RelationshipType, RelatedCatalogItem, NotesSpecify 5.2 and the Specify 5.2 DiGIR installer supports several Darwin Core versions.
Last modified 2007-07-06 10:54 AM