File Naming and Data Deposition Example
From the example used above, seawater was collected at a field station ST28 from different depths for 14 days during the KT-235 cruise. Seawater samples were processed for untargeted proteomics, targeted and untargeted metabolomics (positive and negative ionization modes), and shotgun metagenomic sequencing. Water for nutrient (organic and inorganic) measurements was also collected. CTD casts were conducted during each collection. The internal C-CoMP dataset_no is CMP002.
The numbers in the file names below for all data streams except proteomics refer to file numbers and not unique sample IDs. This decision was made intentionally to avoid misnaming files, account for quality control samples, expedite the file naming process, and incorporate replicates or samples that have to be re-analyzed. Sample IDs or identifiers should be assigned across datasets to link samples across different data streams, but will be included in metadata tables and not the file names.
File names generated via proteomics will contain Sample IDs within the file names due to existing lab procedures for this data stream.
Proteomics files →Proteomexchange, linked to BCO-DMO and Ocean protein portal (OPP)
CMP002_072922_QE_2DDDA_SA20.mzML
CMP002_072922_QE_2DDDA_SA21.mzML
CMP002_072922_QE_2DDDA_SA22.mzML
Targeted metabolomics (LC-MS) → MetaboLights, linked to BCO-DMO
CMP002_072122_Altis_RP_45.mzML
positive ion mode
CMP002_072122_Altis_RP_46.mzML
positive ion mode
CMP002_072122_Altis_RP_47.mzML
positive ion mode
CMP002_072122_Altis_RP_90.mzML
negative ion mode
CMP002_072122_Altis_RP_91.mzML
negative ion mode
CMP002_072122_Altis_RP_92.mzML
negative ion mode
Untargeted metabolomics (LC-MS) → MetaboLights, linked to BCO-DMO
CMP002_072122_Lumos_RP_45.mzML
positive ion mode
CMP002_072122_Lumos_RP_46.mzML
positive ion mode
CMP002_072122_Lumos_RP_47.mzML
positive ion mode
CMP002_072122_Lumos_RP_90.mzML
negative ion mode
CMP002_072122_Lumos_RP_91.mzML
negative ion mode
CMP002_072122_Lumos_RP_92.mzML
negative ion mode
Whole metagenomic sequencing → NCBI SRA; Linked to BCO-DMO
CMP002_WMX_SA_20_R1_001.fastq.gz
Forward
CMP002_WMX_SA_20_R2_001.fastq.gz
Reverse
CMP002_WMX_SA_21_R1_001.fastq.gz
Forward
CMP002_WMX_SA_22_R2_001.fastq.gz
Reverse
CMP002_WMX_SA_22_R1_001.fastq.gz
Forward
CMP002_WMX_SA_22_R2_001.fastq.gz
Reverse
Metadata (CTD and nutrient concentrations) → BCO-DMO
CMP002_KT235_ST28_CTD_nutrients.txt
Spreadsheet with Sample_ID (unique combination of depth X day) as rows and time, temperature, salinity, TOC, NO3, NO2+NO3 etc as columns. If possible, columns can be included that link individual biosample accession numbers (shotgun metagenomics) data and the file names back to the sample_ID and other metadata.
Last updated