🔢
C-CoMP Data Management Handbook
  • C-CoMP Data Management Handbook
  • Table of Contents
  • Executive Summary
  • Glossary of Terms
  • Overview
  • C-CoMP Data Roadmap
  • Internal C-CoMP Dataset Numbers
  • Sending samples to other labs
  • Data Group Definitions
  • Data Deposition Instructions
    • Metadata and Tabular Data Files
    • Raw and Derived Data Files
      • LC-MS Metabolomics
      • LC-MS Proteomics
      • NMR Metabolomics
      • Genomics/Sequencing Data
  • Numerical Models
  • Software & Tools
  • Data Products
  • File Naming Conventions
    • LC-MS Metabolomics
    • LC-MS Proteomics
    • NMR Metabolomics
    • Sequencing Files
    • Sequencing Products
    • Numerical Models & Products
    • Derived Files
    • Metadata & Tabular Data
  • File Naming and Data Deposition Example
  • Digital Coordinator Role
  • FAQ
  • Appendix
    • Quick Links
    • Spreadsheet Templates
Powered by GitBook
On this page

Data Group Definitions

PreviousSending samples to other labsNextMetadata and Tabular Data Files

Last updated 2 years ago

At a higher-level, data generated by C-CoMP is classified into one or more of the following groups: metadata, raw data, derived data, and data products. Definitions and examples of these categories are provided here:

  1. Metadata include descriptive qualitative and quantitative measurements that provide contextual information for other data (adapted from ). Metadata is data about data. Examples of metadata could include:

    1. Date, time, and depth of sample collection (applicable for oceanographic field data) that describe environmental conditions during the collection of a seawater sample destined for targeted metabolomics.

    2. Total organic carbon or nitrate concentrations that describe the chemical conditions of an environmental sequencing sample.

  2. Raw data files are generated when samples are run on an instrument. Raw data files are the initial files that have not been modified, corrected, compressed, or filtered. Examples of raw data include:

    1. .fastq files from a sequencing run

    2. Vendor-specific formatted files created during a run on a Mass Spectrometer (e.g. in metabolomics and proteomics workflows).

  3. Derived data files include files that have been converted into a different format from the original version. For example, data with the file extension .mzML are derived files and have been converted from the vendor-specific .RAW files using the tool MSConvert. When possible, raw data files should be converted into open, derived file formats and these files should be submitted to domain repositories.

  4. Data products include results of data analysis. Any files generated from raw or derived files are included in this category. Examples of data products include:

    • Metagenome assembled genomes (MAGs)

    • Feature relative intensity matrix generated after derived LC-MS .mzML files are processed in XCMS

    • Protein spectral counts and identifications

    • Modeling results

Data groups determine where data is stored and how it is shared and submitted to repositories. In the next section, data-dependent deposition instructions are provided.

this definition