Data Group Definitions

At a higher-level, data generated by C-CoMP is classified into one or more of the following groups: metadata, raw data, derived data, and data products. Definitions and examples of these categories are provided here:

  1. Metadata include descriptive qualitative and quantitative measurements that provide contextual information for other data (adapted from this definition). Metadata is data about data. Examples of metadata could include:

    1. Date, time, and depth of sample collection (applicable for oceanographic field data) that describe environmental conditions during the collection of a seawater sample destined for targeted metabolomics.

    2. Total organic carbon or nitrate concentrations that describe the chemical conditions of an environmental sequencing sample.

  2. Raw data files are generated when samples are run on an instrument. Raw data files are the initial files that have not been modified, corrected, compressed, or filtered. Examples of raw data include:

    1. .fastq files from a sequencing run

    2. Vendor-specific formatted files created during a run on a Mass Spectrometer (e.g. in metabolomics and proteomics workflows).

  3. Derived data files include files that have been converted into a different format from the original version. For example, data with the file extension .mzML are derived files and have been converted from the vendor-specific .RAW files using the tool MSConvert. When possible, raw data files should be converted into open, derived file formats and these files should be submitted to domain repositories.

  4. Data products include results of data analysis. Any files generated from raw or derived files are included in this category. Examples of data products include:

    • Metagenome assembled genomes (MAGs)

    • Feature relative intensity matrix generated after derived LC-MS .mzML files are processed in XCMS

    • Protein spectral counts and identifications

    • Modeling results

Data groups determine where data is stored and how it is shared and submitted to repositories. In the next section, data-dependent deposition instructions are provided.

Last updated