> For the complete documentation index, see [llms.txt](https://c-comp.gitbook.io/data-management-handbook/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://c-comp.gitbook.io/data-management-handbook/data-group-definitions.md).

# Data Group Definitions

At a higher-level, data generated by C-CoMP is classified into one or more of the following groups: metadata, raw data, derived data, and data products. Definitions and examples of these categories are provided here:

1. **Metadata** include descriptive qualitative and quantitative measurements that provide contextual information for other data (adapted from [this definition](https://old.nnlm.gov/data/thesaurus/metadata)). Metadata is data about data. Examples of metadata could include:&#x20;
   1. Date, time, and depth of sample collection (applicable for oceanographic field data) that describe environmental conditions during the collection of a seawater sample destined for targeted metabolomics.
   2. Total organic carbon or nitrate concentrations that describe the chemical conditions of an environmental sequencing sample.
2. **Raw data** files are generated when samples are run on an instrument. Raw data files are the initial files that have not been modified, corrected, compressed, or filtered. Examples of raw data include:
   1. .fastq files from a sequencing run
   2. Vendor-specific formatted files created during a run on a Mass Spectrometer (e.g.  in metabolomics and proteomics workflows).
3. **Derived data** files include files that have been converted into a different format from the original version. For example, data with the file extension .mzML are derived files and have been converted from the vendor-specific .RAW files using the tool MSConvert. When possible, raw data files should be converted into open, derived file formats and these files should be submitted to domain repositories.&#x20;
4. **Data products** include results of data analysis. Any files generated from raw or derived files are included in this category. Examples of data products include:&#x20;
   * Metagenome assembled genomes (MAGs)&#x20;
   * Feature relative intensity matrix generated after derived LC-MS .mzML files are processed in XCMS&#x20;
   * Protein spectral counts and identifications&#x20;
   * Modeling results

Data groups determine where data is stored and how it is shared and submitted to repositories. In the next section, data-dependent deposition instructions are provided.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://c-comp.gitbook.io/data-management-handbook/data-group-definitions.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
