🔢
C-CoMP Data Management Handbook
  • C-CoMP Data Management Handbook
  • Table of Contents
  • Executive Summary
  • Glossary of Terms
  • Overview
  • C-CoMP Data Roadmap
  • Internal C-CoMP Dataset Numbers
  • Sending samples to other labs
  • Data Group Definitions
  • Data Deposition Instructions
    • Metadata and Tabular Data Files
    • Raw and Derived Data Files
      • LC-MS Metabolomics
      • LC-MS Proteomics
      • NMR Metabolomics
      • Genomics/Sequencing Data
  • Numerical Models
  • Software & Tools
  • Data Products
  • File Naming Conventions
    • LC-MS Metabolomics
    • LC-MS Proteomics
    • NMR Metabolomics
    • Sequencing Files
    • Sequencing Products
    • Numerical Models & Products
    • Derived Files
    • Metadata & Tabular Data
  • File Naming and Data Deposition Example
  • Digital Coordinator Role
  • FAQ
  • Appendix
    • Quick Links
    • Spreadsheet Templates
Powered by GitBook
On this page
  • What is a dataset at C-CoMP?
  • What is a dataset at BCO-DMO?

Internal C-CoMP Dataset Numbers

PreviousC-CoMP Data RoadmapNextSending samples to other labs

Last updated 2 years ago

An internal dataset number is assigned to each C-CoMP dataset. Dataset numbers organize C-CoMP metadata and data internally. Dataset numbers will follow this format: CMP###.

If you are a member of C-CoMP, add the #dataset_number_requests slack channel to your C-CoMP Slack Workspace. To obtain a dataset number, please request a new number in the channel and tag your primary C-CoMP collaborators as well as Laura Gray, the C-CoMP Digital Coordinator, in the message. For example, you could type “@Laura Gray - @name and I need a C-CoMP Dataset Number”. Laura Gray will assign you the next available dataset number (CMP###) by replying to your message. Once your number has been assigned, please record as much information as possible (placeholders are fine) about your dataset in the C-CoMP Data Catalog.

A blank template of the C-CoMP Data Catalog can be downloaded here

What is a dataset at C-CoMP?

At C-CoMP, a dataset is defined as a collection of data that relates back to the same original samples. Data, even if they are generated using different methods/measurements are part of the same dataset if the measurements were conducted on the same samples. A set of data within a larger dataset can also be accessed individually and analyzed.

For example, seawater was collected from different depths during a CTD cast. These seawater samples were processed for untargeted proteomics and metabolomics as well as shotgun metagenomic sequencing. Nutrient concentrations were also measured. According to our definition, all of these data streams can be combined to create a comprehensive dataset that describes the chemical and biological properties of seawater across different depths. Parts of the dataset like the untargeted proteomics data can also be accessed and analyzed individually. In this example, all of the data will be assigned the same internal C-CoMP Dataset number (e.g. CMP004; located at the beginning of each file name) to facilitate collaboration, data integration, indexing, and tracking efforts. The file names for each sample X datastream will differ according to the instructions outlined in .

What is a dataset at BCO-DMO?

When metadata and tabular data are submitted to BCO-DMO, datasets that include multiple datastreams are divided and submitted as separate datasets by datastream. In the example above, individual dataset landing pages would be created under the same BCO-DMO project for the untargeted proteomics, untargeted metabolomics, metagenomic sequencing, and nutrient measurements. Laura Gray will work with you to organize these submissions. Internal C-CoMP dataset numbers will not be referenced on BCO-DMO dataset landing pages.

File Naming Conventions
13KB
C-CoMP_DataCatalog_Template.xlsx