Comment on page
Internal C-CoMP Dataset Numbers
An internal dataset number is assigned to each C-CoMP dataset. Dataset numbers organize C-CoMP metadata and data internally. Dataset numbers will follow this format: CMP###.
If you are a member of C-CoMP, add the #dataset_number_requests slack channel to your C-CoMP Slack Workspace. To obtain a dataset number, please request a new number in the channel and tag your primary C-CoMP collaborators as well as Laura Gray, the C-CoMP Digital Coordinator, in the message. For example, you could type “@Laura Gray - @name and I need a C-CoMP Dataset Number”. Laura Gray will assign you the next available dataset number (CMP###) by replying to your message. Once your number has been assigned, please record as much information as possible (placeholders are fine) about your dataset in the C-CoMP Data Catalog.
A blank template of the C-CoMP Data Catalog can be downloaded here
At C-CoMP, a dataset is defined as a collection of data that relates back to the same original samples. Data, even if they are generated using different methods/measurements are part of the same dataset if the measurements were conducted on the same samples. A set of data within a larger dataset can also be accessed individually and analyzed.
For example, seawater was collected from different depths during a CTD cast. These seawater samples were processed for untargeted proteomics and metabolomics as well as shotgun metagenomic sequencing. Nutrient concentrations were also measured. According to our definition, all of these data streams can be combined to create a comprehensive dataset that describes the chemical and biological properties of seawater across different depths. Parts of the dataset like the untargeted proteomics data can also be accessed and analyzed individually. In this example, all of the data will be assigned the same internal C-CoMP Dataset number (e.g. CMP004; located at the beginning of each file name) to facilitate collaboration, data integration, indexing, and tracking efforts. The file names for each sample X datastream will differ according to the instructions outlined in File Naming Conventions.
When metadata and tabular data are submitted to BCO-DMO, datasets that include multiple datastreams are divided and submitted as separate datasets by datastream. In the example above, individual dataset landing pages would be created under the same BCO-DMO project for the untargeted proteomics, untargeted metabolomics, metagenomic sequencing, and nutrient measurements. Laura Gray will work with you to organize these submissions. Internal C-CoMP dataset numbers will not be referenced on BCO-DMO dataset landing pages.