Using Google Cloud Tools to download DCLDE dataset
Google has a general quickstart guide on the methods for accessing public data. It will take an extremely long time to download the dataset via a web browser due to the size of the DCLDE dataset (~8TB for the uncompressed version and ~4TB for the flac compressed version – note that the uncompressed version will be deleted on 15 Nov 2021). A faster method is by using the Google Cloud SDK tool gsutil. Install instructions.
Notes on install
– You do not need to activate cloud tools.
– You do NOT need to add payment information to access the public dataset.
– You do not need to login to download a public dataset. When you first open the Google Cloud SDK Shell it will say: “You must log in to continue. Would you like to log in (Y/n)?” You can select n and it will still work to download the public dataset
Basic gsutil command for downloading the data:
gsutil -m cp -r gs://noaa-pifsc-bioacoustic LocalFolder
cp copy command
-r copies the entire directory tree
gs://noaa-pifsc-bioacoustic the google cloud bucket name. If you use this directory it will download all files in the cloud bucket and create the appropriate folders in your destination path. You can download a subset of the files by further specifying the path (Example: gs://noaa-pifsc-bioacoustic/metadata)
LocalFolder the local folder where the files will be copied. (Example: C:\Users\Ann.Allen\Documents\test)
Any spaces in your folder names will prevent the command from working
How to throttle the download bandwidth
Top level command line options
cp command options
Accessing public data without credentials