Data Acquisition

From its inception, CDIP has considered automated data acquisition to be a core responsibility. Data is never merely recorded locally, at the sensor; it always transfered back to a central archive. This is done at regular intervals, without any operator intervention. In years past, monitoring such a vast spread of locations - in 2017, for instance, CDIP stations ranged from Guam to Georgia - was a tremendous challenge. Thankfully, with modern communication methods, the task is quite manageable and allows us to efficiently monitor many stations all over the globe.

System Organization

The figure below is a nice graphical representation of how the CDIP data acquisition process works. The buoy has two possible communication channels which can be operating simultaneously. The primary means of communication makes use of the Iridium satellite system. This system of satellites provides complete coverage of the globe at all times. (Well, it was supposed to provide complete coverage and will do so with the next generation of satellites coming on line in 2017). Every 30 minutes the buoy’s iridium transceiver turns on and connects to the Department of Defence Iridium gateway in Honolulu Hawaii via Iridium satellites as pictured below. The gateway forwards the data to a virtual computer in Amazon’s cloud infrastructure where it is written to disk. From there, the data is forwarded to the CDIP data center at Scripps Institution of Oceanography (SIO) in La Jolla California where it is processed and archived. If for some reason the iridium communication channel is not working (e.g. kelp covering the iridium antenna on the buoy) the backup HF radio transmission channel is utilized. Not every station will have this backup communication channel because it requires a land-based ‘shore station’ located within 10 miles of the buoy. If there is no communication channel, all is not lost. The data are continuously written to onboard compact flash. When the buoy is recovered, the flash data can be used to fill in any data gaps. There is also a failsafe communication channel if traffic cannot get to the Amazon server. In that case the data is sent directly to the servers at SIO.

image0

One key aim of this system is to provide completely reliable access to our buoy GPS location data and wave parameters in case of a severe disaster such as an earthquake. The new system makes use of virtual machines on Amazon’s cloud infrastructure as the primary landing platform because of the built-in reliability of such data centers. In addition, the virtual machine can be relocated to another data center almost instantly if ever the need arises.

Another key aim is reliability. CDIP ensures that measured data ultimately get processed and disseminated by creating multiple redundant copies of those data at each step of the way. The buoy, the shore station, the virtual computer and the CDIP servers all store copies of the data which in most cases can be accessed at any time.

When data first land on the Amazon virtual computer whether from an Iridium buoy or shore station, the important GPS data and wave parameters are stripped from each file when it is first acquired and subsequently made available via a web server from that instance. It is then immediately transferred to the CDIP servers where it is processed and archived on solid state disks in ZFS raided datasets. All data is synchronized with an offsite CDIP server daily and uploaded to Amazon’s Glacier data store quarterly. A backup data acquisition system is also maintained on CDIP servers at SIO in case data cannot reach the Amazon virtual machine.

Hardware

Although the overall setup and operation of CDIP’s data acquisition system has remained remarkably consistent since the mid-1970s, technological advances have resulted in numerous changes to the hardware used in the system, both at the shore stations and in the Lab.

Shore stations

In the 1970s and 1980s, shore stations were composed of custom-designed circuit boards that operated exclusively under hardware control. Holding a very limited amount of data, these boards would upload their contents to the Lab when contacted via modem. In the early 1990s, more control and flexibility was acheived with the introduction of the PC-based ‘smartstation’. Unlike its hardwired predecessors, the smartstation was based on a generalized architecture, allowing it to bring in data from any type of sensor. Equally important, the smartstations opened up two-way communications between the Lab and the shore stations. Instead of simply dumping data back to the lab, the smart stations sent back diagnostic information and could receive and interpret a range of instructions. This allowed numerous sensor parameters - sample rate, record length, etc. - to be controlled remotely from the Lab. Another major advance introduced by the smartstation was the real-time display: at shore stations where interested users wanted to view the data, video output displayed sensor reading and derived parameters in real time.

image4

An Ultra 5 shore station.

At the end of the 1990s a gradual shift was begun to a new generation of shore stations. Based on Sun Microsystems’ Ultra 5 and Blade 100 desktop computers and the Tadpole laptop computer running the Solaris operating system, these new stations greatly expanded upon the flexibility and control of smartstations. These shore stations’ additional features included:

  • full remote shore station access via modem or network links;

  • a multi-screen, graphical display of all measured parameters;

  • continuous data archiving with a multi-year capacity;

  • a seconday, independent zip-based backup data archive; and

  • a smart UPS that ensures graceful shutdown and startup.

image7

Sierra Wireless Raven XT

Around the year 2005, CDIP started to make use of cell communications technology to transfer data via the internet using wireless modems such as Sierra wireless’ Raven XT modem. These have been extremely useful for those locations where direct connection to the LAN is not possible due to security considerations and where there are no phone land-lines available.

image6

image10

RXC-4 receiver

Typical antenna mount for the RXC-4

In early 2010, CDIP started to utilize Datawell’s RXC-4 receiver with ethernet capability. A python script was developed to contact the receiver and ingest streaming data into our processing system. This is a simple low maintenance system which is perfect for remote locations with internet.

image5

image8

A Beaglebone in action

The entire system fits in a small box

By 2013, a new generation of shore station had been developed based on the Beagle bone computer board and the smaller RXC receiver. This new system has all of the functionality of the previous generation of shore station but with a much smaller footprint, less energy consumption and much lower cost. In addition, each Beaglebone can acquire data simultaneously for up to 10 stations storing all of their data perpetually on a 30G micro SD card.

Although shore stations are becoming more rare as our fleet of Iridium capable buoys expands, they are still needed in a number of locations and provide great backup to the Iridium system during outages.

The Lab

In the past, the hardware in the Lab was responsible for contacting all of the shore stations, collecting available data, and archiving it permanently. When CDIP first started operations, this was all handled by a Nova minicomputer. The Nova contacted shore stations with a specially designed modem and communications boards, and then archived the data on nine-track reel-to-reel tapes. In the 1980s the switch was made to PCs. While the modem and communications remained largely the same, the x86 PCs in use archived data to floppy disks in addition to tapes, making the data much easier to handle. The use of a PC as the Lab’s main data acquisition machine continued until the end of the 1990s, when the switch was made to a Sun Ultra 10 workstation. As of 2017, this machine is still being used to acquire data over phone land-lines. Our current suite of servers here at SIO handle data acquisition as well as many other tasks such as wave modeling. They are listed below for interest.

Evolution of Lab hardware

image1

Nova 1200: 1975 - 1980

image2

x86 PC: 1980 - 1999

image3

Sun Ultra: 1999 - 2015

image9

Oracle X4-2Ls: 2015 - present

As mentioned above, in early 2013, CDIP changed the basic process of data acquisition in order to provide completely reliable access to our buoy GPS location data and wave parameters in case of a severe disaster. Even though cloud ‘hardware’ is constantly changing, CDIP no longer needs to be concerned about those details. We do care about the OS on the cloud server, but as cloud technology progresses, even that will get abstracted into services making updates and version control a thing of the past enabling CDIP to focus on its core expertise.

Acquisition Software

There are 4 main data acquisition scripts written in Python that acquire data from the following sources: Iridium TCP connections, the Datawell RXC-4 ethernet receiver TCP connections, the Datawell RXC receiver RS232 output and RS232 serial output from other sensors such as pressure sensors and anemometers (currently only at Scripps Pier).

Iridium TCP connections

For Iridium communications the server’s inet service listens on port 9210 for Waverider Mark III buoys and port 9211 for Waverider 4 buoys. When a mark III buoy connects, an in-house developed FORTRAN program is run which implements Datawell’s iBuoy communication protocol. When a mark IV buoy connects, an in-house developed Python script is run which implements the Datawell Transmission Protocol (DWTP).

Along with the basic functionality of querying the buoy for specific data and saving the data to disk, the Python scripts also can be set to automatically download xyz data when the significant wave height is above a threshold or fill-in missing data of any type.

Datawell RXC-4 ethernet receiver TCP connections

For shore stations, a second Python script was developed to acquire data from the Datawell RXC-4 receiver. Instead of listening on a port, this script initiates a TCP socket connection with the receiver and receives a stream of data. Those data are compiled into 3 minute 45 second ‘fd’ files (to conform with our FORTRAN processing programs). Every 30 minutes (after the latest data has been received) the data are gzipped and sent to the cloud node using rsync via ssh.

Datawell RXC receiver RS232 output

The data received from the RS232 port of the receiver is actually the same as that received via the internet. This script works similarly except that it opens a serial port rather than a TCP port.

Other RS232 sensors

CDIP has developed a Python script for acquiring data from an arbitrary number of sensors and combining those data into a single output file - as long as those sensors have the same sample rate and can be polled for their data. Each sensor is assigned a port either from within the script or on the command line. The sensors are first polled, a process that takes microseconds, then the data is read from the sensor’s buffer and processed. Finally the processed data is written to a file. Currently data is being received from 2 pressure sensors and an anemometer on Scripps Pier using this script.

For more details on the data acquisition software and shore station configuration, please direct questions to the CDIP programmers.