CDIP's metadata can be accessed from both static files and dynamic
pages. The static files are produced by a cron at the start of
each week (early Monday morning) and are generated for all public
data sets. These files are in xml format, and can be viewed or
harvested from the following web-accessible folder:
The dynamic metadata is generated from the FGDC metadata links provided on
the station pages in the historic section of the website. This metadata
can be viewed in two formats: html and xml. The html metadata is displayed
on a standard CDIP web page; the xml output is placed on its own page.
B) IMPLEMENTATION DETAILS
All of the content of CDIP's FGDC metadata is generated by querying
our 'archive' MySQL database. A script in .root/php_lib - make_meta.cdip -
generates the content for a given station, stream, and setting of
the public flag (public/nonpub).
The output of the php script is in text format. This text is then passed
through the USGS utility 'mp':
The mp program verifies that the metadata is FGDC-compliant, and then
outputs it in the desired format, either html or xml.
FGDC metadata consists of seven main sections, five of which do not
need to be included if they do not apply to the data set in question.
For CDIP metadata, two sections are omitted - Spatial_Reference_Information
and Spatial_Data_Organization_Information - because they only apply
to datasets that include spatial data. (Although CDIP's metadata contains
spatial info - deployment positions - the data sets themselves do not.)
Thus CDIP metadata consists of five sections:
CDIP Metadata = Identification_Information +
Many of the fields in the content standard are defined as free text,
and can contain links to other resources. CDIP's metadata takes
full advantage of this fact, linking to relevant documents and pages
on the CDIP website wherever possible. This is the most efficient and
effective approach because CDIP's online documentation is extensive
and covers most of the topics addressed in the FGDC standard. By linking
directly to CDIP's web resources, we avoid redundant information and
ensure that the metadata is kept as up-to-date as possible.
This same approach is used in defining CDIP's entity and attribute
information. Although other groups are currently working to define wave
entities and attributes in fine detail, this seems unnecessary for CDIP.
According to the FGDC standard, entities can be defined by a 'detailed
description' or by an 'overview description'; overview descriptions
are acceptable as long as there is a 'detail citation' pointing to a
more extensive definition of the entity. Because CDIP produces a wide
range of products which already have web-available format descriptions,
using overview descriptions is by far the most efficient approach for
us. Thus in the metadata, overviews of our different entities (sp
files, pm files, xy files, etc.) are given, and the detail citation
field is used to give a link to the full product description on the web.
D) FUTURE DEVELOPMENTS
Although there is work being done in the wave gauging community to define
attributes and entities in detail, there are other areas of CDIP's metadata
which have a more pressing need for standardization. It is likely that
search tools and other utilities will focus on section one of the metadata,
the Identification_Information. In that section, the content of the data set
is identified primarily by the use of keywords. Right now, the keywords
listed in our metadata are simply the default data type titles ('wave
energy', 'wave direction', etc.) found in our php library. These keywords
will be much more useful if we start taking them from a referenced thesaurus
that has been created or endorsed by the research community.
Another weakness of the CDIP metadata is its lack of station-specfic
information about the purpose and distinctive charactersitics of the data
sets. Right now, the 'purpose' of all CDIP's data sets is the same, one very
generalized statement. Many of our datasets were created in response to
very specific needs and special situations (e.g. the NCEX datasets); it
would be nice if a description of these contexts was also present in the