Data Management

One great benefit of CDIP’s longevity is the fact the program has been able to generate numerous long-term data sets. For many locations, wave information has been collected that spans three decades. All of this data - from the first observations made in 1975 to those being made at this very moment - are archived in a number of standardized, easily accessible formats. These archives form an invaluable resource for researchers and engineers, providing a unique record of wave climatology along our nation’s coasts.

Stations and Sets

At the broadest level, CDIP data is organized by station. But what, exactly, is a station? The term is used freely on the website and throughout this documentation without definition, largely because the implied meaning is generally correct: A station is a location where CDIP maintains sensors and collects wave and climatological data. Thus stations are named according to their geographic locales - Dana Point, Waimea Bay, etc.

CDIP’s station definition

Defining the term station more precisely is quite difficult. For the first couple of decades of CDIP’s existence, data was organized strictly according to shore station, i.e. according to the site at which the data were initially recorded and transmitted. So if data from sensors miles apart - say inside and outside of a harbor, as at Noyo (Station 030) - were recorded at the same site, they would fall under the same station. Whereas sensors which had their data recorded at different sites would always be separate stations, even if the sensors were located just a few hundred meters apart. Essentially it was the logistics of setting up the shore station hardware that determined how sensors were grouped in stations, rather than any direct consideration of the wave climates or geographic characteristics of the locations involved.

image0

image1

Mission Bay Harbor:

Oceanside Harbor:

Five sensors as one station (015)

Three sensors as three stations (068, 069, 070)

When CDIP started relying more heavily on buoys in the 1990’s, the situation became even more muddled. Unlike pressure sensors and arrays, buoys require frequent maintenance and redeployment. And redeployment sometimes leads to changes in location: a mile further offshore or inshore, atop a different bank or depth contour. When buoys are in continuous use for 10 years or more, it is not uncommon for them to be deployed at half a dozen distinct sites, maybe even more. Thus the use of wave buoys has made stations even more dynamic, and harder to get a handle on.

It is therefore important to remember that while each of CDIP’s stations does correspond to a general location, they do not correspond to precise locations or wave environments. Over the program’s history a wide range of influences - from funding sources to the logistics of laying cable - have affected how sensors are assigned to stations. As a result, it is possible that a single station covers several varying wave environments, and that a single wave environment contains several different stations.

image2

The waters off of Point Arguello and Point Conception have been heavily studied, with sensor data recorded under several different station numbers.

Data sets

When a single station’s sensors have collected data from two or more distinct wave environments, it is clearly essential that data from these differing locales be kept distinct. To this end, station data can be organized into separate data sets. Within the processing archive - CDIP’s extensive database of processing information - there are detailed instructions on how to organize the data from each station. Which records from which sensors on which dates should be placed in which data sets? The processing archive has the answers.

Most data sets are set up to distinguish distinct wave environments, situations where a station’s sensors cover a range of geographic locales. Data sets also have other uses, however, especially for keeping the results of different processing regimes distinct. For instance, pressure sensor data can be undergo three types of energy processing: standard wave processing, surge processing, and basin processing. In some locations, data from a single sensor is run in more than one of these regimes. In these instances, the results from surge processing will be kept in a SURGE data set, the results from basin processing will be kept in a BASIN data set, etc. Each data set is assigned a name which indicates either the geographical locale to which it corresponds, or the processing regime which was applied.

image3

The 12 data sets from Barbers Point are distinguished by location and/or processing type

Setting up data sets based on processing differences is straightforward - either surge processing was applied, or it was not; there is no middle ground. Data sets based on differences in location are more problematic, however. This is especially true in the case of buoys, with their frequent changes in position. How far can a sensor move before it requires a distinct data set? What, precisely, defines a wave environment, and what distinguished one wave environment from another?

Grays Harbor, Station 036

Because CDIP has been involved in a wide range of research projects over the years, collecting data for numerous uses, there is no definitive answer to these questions. In some instances, sensors very close together have been assigned to separate data sets, so that researchers can investigate the subtle differences between them; in other instances, locations miles apart have been grouped together in a single data set to provide researchers with a single, long-term climatological record. Of course, the manner in which CDIP has set up its stations and data sets will not suit all users. But this is never an insurmountable problem, since all the data and products supplied by CDIP can traced back to their precise origin in space and time, as will be explained below.

image4

Above: Data from dozens of buoy deployments - spanning nearly 25nm - are all grouped into one data set, providing a long-term (20+ year) climatological record.

Data Storage and Files

As of 2017, CDIP now stores all of its data in netCDF formatted data files which are primarily accessed through the CDIP THREDDS server. By browsing the THREDDS web interface all of CDIP’s buoy data and metadata can be accessed quite easily. Descriptions of the various files you will encounter can be found in Data Access.

Although the non-buoy data is also stored in netCDF formatted file, it is currently held in a non-public repository until we can complete the archiving process and provide those data with satisfactory metadata. Wave parameters and spectra for the pressure sensors as well as statistical summaries of other non-buoy data types, can be served using ndar.cdip as described in Data Access. However, to access the detailed second-by-second time series data from those sensors requires delving into our old system. Please direct requests for those data to the CDIP programmers.