Prior datasets: Workshops 1-6, Workshop 7, Workshop 8

Dataset

Reference as:
NOAA Pacific Islands Fisheries Science Center. 2022. Hawaiian Islands Cetacean and Ecosystem Assessment Survey (HICEAS) towed array data. Edited and annotated for the 9th International Workshop on Detection, Classification, Localization, and Density Estimation of Marine Mammals Using Passive Acoustics (DCLDE 2022). NOAA National Centers for Environmental Information. https://doi.org/10.25921/e12p-gj65 [access date]

The DCLDE Oahu dataset consists of a subset of passive acoustic data collected using a multi-channel towed hydrophone array during the Hawaiian Islands Cetacean and Ecosystem Assessment Survey (HICEAS) in 2017. HICEAS was a visual and passive acoustic survey using line-transect methods. The survey took place from July through November of 2017 using two research vessels that systematically surveyed the entire Hawaiian Exclusive Economic Zone (EEZ). The full details of the HICEAS survey, including the data collection methods and summary of the species encountered acoustically and visually are available in a NMFS Technical Memo (Yano et al. 2018). Passive acoustic data collection methods are summarized below. The raw passive acoustic data files are available to download from Google Cloud Public Datasets. Associated species detection data, as well as corresponding visual sighting data are available in the Google Cloud bucket and are linked for download below.

Forty-seven days of towed array data are provided for DCLDE Oahu. The data subset was chosen to provide reasonable representation from data collected on two research vessels, all months of survey effort, and to maximize the number of detections for the greatest number of species.  Days with array failures, or no or few identified acoustic detections, were generally excluded. The DCLDE dataset was curated to provide 10 or more annotated encounters with as many species as possible, including large whales, delphinids, blackfish, and beaked whales.  In addition to optimizing detections for many species, several others are known to occur within the provided data, including several delphinids, Longman’s beaked whales and the unidentified beaked whale ‘BWC’ (McDonald et al. 2009, Baumann-Pickering et al. 2013). Some detection events occur as mixed species groups.

Data Collection Details

A towed hydrophone array was deployed behind each ship for acoustic line transect survey. The towed hydrophone array components and data acquisition system on each ship were designed to be as similar as possible to ensure the acoustic recordings would be comparable between the two ships. This system was comprised of a modular towed array (Rankin et al. 2013), SA Instrumentation DAQ soundcard, laptop computers, and PAMGuard software v. 2.00.10fa (Gillespie et al. 2008). Hydrophones were arranged in two array clusters of 3 hydrophones each, identified as the inline and end arrays, and with array segments separated by either 20 or 30 m. Hydrophones were spaced approximately 1 m apart within each array section.  The distance from the ship’s GPS antenna, including the length of the tow cable, to each hydrophone in the array is detailed in the metadata file provided. The majority of the provided dataset include data from 6 channels, though there is a small subset with data from only 3 channels. The subset with fewer array channels can be identified by an ‘NA’ in the hydrophone 4-6 details within the metadata.


Arrays used HTI-96-min hydrophones and custom-built pre-amplifiers with combined average measured sensitivity of -144dB +/- 5dB re: 1V/µPa from 2-100 kHz and approximately linear roll-off to –156dB +/- 2 dB re 1V/ µPa at 150kHz. The hydrophones have a strong high-pass filters at 1600 Hz to reduce low-frequency flow noise and ship noise, reducing sensitivity by 10dB at 1000 Hz. The inline and end arrays also contained a Kellar (PA7FLE) or Honeywell (PX2EN1XX200PSCHX) depth sensor, with depth recorded every second with a voltage MicroDAQ (max voltage +/- 2V).  Depth data collected aboard the R/V Sette were collected at 12-bit (model USB-1208LS), and data from the R/V Lasker were 16-bit (model USB-1608G). The acoustic DAQ sampled all six channels simultaneously at 500 kHz sample rate and applied 0-12 dB of gain to the incoming signal from each hydrophone. The preamplifier gain specific to each hydrophone and any additional gain applied to each channel through the DAQ during real-time monitoring is detailed in the metadata file provided.

The visual and passive acoustic teams worked independently in the field. The visual team was considered “on-effort” when the ship was moving at 10 kts (+/- 1 kt) along the pre-determined trackline in Beaufort sea states of 6 or less. The visual team consisted of 3 observers, one port side observer watching through “big eye” 25x deck-mounted binoculars, one center observer and data recorder searching naked eye, and one starboard side observer watching through big-eyes. During “on-effort” search, the visual survey team recorded all cetacean sightings ahead of the beam of the ship (90o to port or starboard). Cetacean groups observed behind the beam or during “off-effort” chase or other periods are reported as “off-effort” sightings.  Although the passive acoustic team would receive information about visual sightings in near real-time, passive acoustic detection information was not relayed to the visual team until the group was sighted or had passed beyond the beam of the array.  Upon visually sighting a cetacean group, the observer team generally instructed the ship to turn toward the group to facilitate species ID and group size estimation, independent of whether the acoustic team had detected or localized the group (though see below for false killer whales and sperm whales). The details of the visual survey are provided in the Tech Memo (Yano et al. 2018).

Special data collection protocols were implemented for visual and passive acoustic encounters with false killer whales and sperm whales, and the detailed protocols are provided in the Tech Memo (Yano et al. 2018). In short, during false killer whale encounters both teams remained on-effort in passing mode (not deviating from the trackline to approach the group) until all detected sub-groups had passed the beam of the ship.  This passing mode period could extend for over an hour and both teams remained on-effort, recording all sightings, during that period. Once all detected groups had passed the beam, the ship was generally directed to turn back toward the group to allow for other data collection. There are passive acoustic encounters with false killer whales that went undetected by the visual team until the passing mode phase was complete, such that those encounters are noted as off-effort in the visual sightings data. For sperm whales, the visual team was instructed to remain on-effort along the trackline until the sperm whale group had passed the beam or until the acoustic team could verify that they had localized the group.   

A team of acousticians monitored incoming data in real-time and were assisted with echolocation click and whistle and moan detectors implemented in PAMGuard. Echolocation click detectors were based on custom specifications (Keating and Barlow 2013), that provided echolocation clicks coded by frequency content to correspond with clicks produced by sperm whales, beaked whales, and delphinids.  Acoustic species identification was assigned in the field for a limited number of species (beaked whales, sperm whales, and minke whale boings) and beaked whales species ID was verified during post-processing using the known spectral and temporal characteristics of the detected clicks. When acoustically-detected groups were also seen by the visual observation team, the visual-based species ID is provided for that encounter. The start and end times of acoustic encounters was verified during post-processing. The start time of an encounter is the time of the first call detected by the PAMGuard detector or through aural monitoring and the end time is the time of the last call detected through the same methods.   Although data collection and detection details were archived in PAMGuard during real-time monitoring, the provided metadata file is reduced from that original database and may not be directly uploaded into current PAMGuard versions.  Each acoustic event represents encounters (inclusive of clicks and whistles) for all species other than minke whales.  Minke whale boings are noted by the start and end of individual calls, without attempt to associate calls with individual whales.  Bearing and location data provided in the metadata file was based on target motion analysis using 2 hydrophones in the end array, implemented in PAMGuard in real-time. Other than general agreement with visual sighting information, the localization data provided has not been post-processed or verified. There is no location data provided for minke whale boings.

The Data

UPDATE: A flac compressed version of the dataset is now available. The flac version lives in the same bucket as the uncompressed version. Corresponding folders include the label “FLAC”. The uncompressed version will be removed on 15 Nov 2021.

The full DCLDE dataset is approximately 8TB uncompressed and 4TB flac compressed. This large dataset is hosted for download by Google Cloud Public Datasets and can be found here. Individual files can be downloaded using your browser, or the entire dataset can be downloaded using the Google Cloud SDK Shell. Additional instructions for accessing the data are provided here.  Each data file contains the full-bandwidth data for all sampled channels. The data files are provided in 16-bit wav format with the following naming convention:

File example: 1705_20170713_002947_605.wav

  • The first four digits represents the numeric code assigned to the effort carried out by the individual research vessel. 1705 represents files collected aboard the NOAA R/V Reuben Lasker. 1706 represents data collected aboard the NOAA R/V Oscar Elton Sette.
  • The remaining values are the start time of the file with format ‘yyyymmdd_hhmmss_fff’. For example the above file started recording at 2017 July 13 at 00:29:47.605. All times are in UTC.

Files are one minute duration unless recording was interrupted, resulting in a shorter file.

We are exploring possible cloud computing opportunities with Google.  If you are interested in running your algorithms on the cloud please contact the workshop organizers to discuss your requirements.

The Metadata

Real-time data collection details, detection annotations, visual sighting data, and reference tables are included in the provided Excel file. The file includes each of the following sheets, with one sheet of each description for R/V Lasker (1705) and one for R/V Sette (1706):

  • Readme: a description of each column heading for each sheet within the file.
  • x_Array: Details of the arrays used throughout the survey.  Occasionally end or inline arrays were swapped out due to damage or data issues.  Whenever an array is changed or the channel gains are adjusted, a new entry is provided with the start and end date and time for those array settings. The array names, individual hydrophone spacing, and any additional gain applied through the SailDAQ to that channel, are provided here. The name of the arrays in use links to the provided PAMGuard array files (for those using PAMGuard to carry out any data processing or analysis steps).  Sequential entries of the same array information indicate that a different array combination was used during periods of data not provided for DCLDE.  We retain this information for consistency in the event that additional data are requested or future efforts use other periods of data.
  • x_OdontoceteDetections: The time, location, and unique visual and acoustic ID number for each annotated encounter are provided, as well as real-time localization info, and the species code corresponding to the visual observer species ID, or the acoustically identified species ID (for sperm whale and beaked whales only). Additional columns indicate the presence of other species within the same group.  Some encounters may overlap in time, but were not considered to be spatially associated with each other in the field, such that those would be listed as separate detections. In few cases separate visual sightings were not distinguishable by the acoustic team, such that more than one sighting ID is noted for an individual detection.
  • x_MinkeDetections:  The start and end time for each minke whale boing are annotated as well as the latitude and longitude of the ship at the detection start. Minke whale “boing” vocalizations were annotated using the Whistle & Moan Detection Module within PAMGuard, using a 7dB signal-to-noise threshold for signal between 1100 and 1800 Hz.  False positives were removed from the annotation dataset. End-time for each boing is derived from the longest duration frequency component above the 7dB SNR threshold. Localization information is not provided for minke whale boings.  Minke whale boings were only detected during data collection aboard the R/V Lasker.
  • x_VisualSightings: Sighting information for each visually-detected group. Includes local time, location, bearing, radial distance to each sighted group, and observer estimated group size. Mixed species groups are represented by more than one entry for a given sighting number. There are 7 mixed species sightings within the data file, six from R/V Lasker and one from R/V Sette.  Association between visual sightings and acoustic encounter should use the sighting number to link observations, as the start time of the visual and acoustic detections may not align.  Group size estimates for false killer whales represent the best estimate of all sub-groups observed; no high or low estimates are provided. Additional sub-group location data are available for false killer whales and sperm whales. Resighting locations for some sightings of other species are also available. Contact the workshop chairs to inquire about additional location data for sightings of interest.
  • Species_Lookup: Numeric species codes for each cetacean species included in the acoustic detection or visual sightings sheets.

Separate from the metadata spreadsheet, the following additional files are provided:

  • DCLDE_LaskerGPS & DCLDE_SetteGPS: GPS time and location stamps at 10 s intervals providing location and speed information throughout the survey. The first four digits at start of the file name correspond to the ship from which the track was recorded (1705 = R/V Lasker, 1706 = R/V Sette).  There are six fields in the track data (UTCtime, LocalTime, Latitude, Longitude, Speed, Effort). Latitude and longitude data are provided in decimal degree with N and E positive values. Ship speed is in kts.  Effort provides the effort status of the visual survey team, with “on-effort” periods (Effort = 1) corresponding to survey along the pre-determined trackline at 10 kts with the full visual survey team.  “Off-effort” (Effort = 0) refers to periods when the ship is directed to approach an individual group to collect additional information or when weather or other factors prevented the visual team from working. The acoustic team is considered “on-effort” during all periods that data are provided.
  • DCLDE_LaskerArrayDepth & DCLDE_SetteArrayDepth: Inline and end array depth (m) recorded once per second.  The first four digits at the start of the file name correspond to the ship from which the track was recorded (1705 = R/V Lasker, 1706 = R/V Sette). There are four fields in the depth data (UTCtime, LocalTime, InlineDepth, EndDepth). Depth data are converted from raw voltage values based on the calibration values for the depth sensor within the array.  For periods with no inline array depth values are listed as NA.
  • ArrayFiles (7 files)- Array specification files for each of the named hydrophone arrays.  These files are provided for those intending to conduct exploratory or other analyses in PAMGuard.

All provided metadata has been truncated to match the period of raw acoustic data files being provided for DCLDE 2020.

Conductivity-Temperature-Depth (CTD) casts were conducted one hour before sunrise and one hour after sunset during each day of the survey.  These data are available for computing sound speed profiles and examining other oceanographic variables.  This page will be updated with a weblink for download from the National Centers for Environmental Information (NCEI) once available.  In the meantime, please contact the workshop organizers for data files ( dclde2020 [at] gmail [dot] com ).

References

Baumann-Pickering, S., M.A. McDonald, A.E. Simonis, A. Salson Berga, K.P.B. Merkens, E.M. Oleson, M. Roch, S.M. Wiggins, S. Rankin, T.M. Yack, J.A. Hildebrand. 2013. Species specific beaked whale echolocation signals. JASA 134(3): 2293-2301. http://doi.org/10.1121/1.4817832

Gillespie, D., D.K. Mellinger, J. Gordon, D. McLaren, P. Redmond, R. McHugh, P. Trinder, X.-Y. Deng, A. Thode. 2009. PAMGUARD: Semiautomated, open source software for real‐time acoustic detection and localization of cetaceans. JASA125: 2547

Keating, J.L. and J. Barlow. 2013. Summary of PAMGuard beaked whale click detectors and classifiers used during the 2012 Southern California behavioral response study. U.S. Dept. of Commerce, NOAA Technical Memorandum NOAA-TM-NMFS-SWFSC-517, 17 p. 

McDonald, M.A., J.A. Hildebrand, S.M. Wiggins, D.J. Johnston, J.J. Polovina. 2009. An acoustic survey of beaked whales at Cross Seamount near Hawaii. JASA 15(2): 624-627.

Rankin, S. J. Barlow, Y. Barkley, R. Valtierra. 2013. A guide to constructing hydrophone arrays for passive acoustic data collection during NMFS shipboard cetacean surveys. U.S. Dept. of Commerce, NOAA Technical Memorandum NOAA-TM-NMFS-SWFSC-511, 36 p.

Yano K.M., E.M. Oleson, J.L Keating, L.T. Balance, M.C. Hill, A.L. Bradford, A.N. Allen, T.W. Joyce, J.E. Moore, A. Henry. 2018. Cetacean and seabird data collected during the Hawaiian Islands Cetacean and Ecosystem Assessment Survey (HICEAS), July-December 2017. U.S. Dept. of Commerce, NOAA Technical Memorandum NOAA-TM-NMFS-PIFSC-72, 110 p. 

Questions?

Email us at dclde2020 [at] gmail [dot] com