File Naming and Data Deposition Example

From the example used above, seawater was collected at a field station ST28 from different depths for 14 days during the KT-235 cruise. Seawater samples were processed for untargeted proteomics, targeted and untargeted metabolomics (positive and negative ionization modes), and shotgun metagenomic sequencing. Water for nutrient (organic and inorganic) measurements was also collected. CTD casts were conducted during each collection. The internal C-CoMP dataset_no is CMP002.

The numbers in the file names below for all data streams except proteomics refer to file numbers and not unique sample IDs. This decision was made intentionally to avoid misnaming files, account for quality control samples, expedite the file naming process, and incorporate replicates or samples that have to be re-analyzed. Sample IDs or identifiers should be assigned across datasets to link samples across different data streams, but will be included in metadata tables and not the file names.

File names generated via proteomics will contain Sample IDs within the file names due to existing lab procedures for this data stream.

Proteomics files →Proteomexchange, linked to BCO-DMO and Ocean protein portal (OPP)

Targeted metabolomics (LC-MS) → MetaboLights, linked to BCO-DMO

Untargeted metabolomics (LC-MS) → MetaboLights, linked to BCO-DMO

Whole metagenomic sequencing → NCBI SRA; Linked to BCO-DMO

Metadata (CTD and nutrient concentrations) → BCO-DMO

CMP002_KT235_ST28_CTD_nutrients.txt

Spreadsheet with Sample_ID (unique combination of depth X day) as rows and time, temperature, salinity, TOC, NO3, NO2+NO3 etc as columns. If possible, columns can be included that link individual biosample accession numbers (shotgun metagenomics) data and the file names back to the sample_ID and other metadata.

Last updated