> For the complete documentation index, see [llms.txt](https://c-comp.gitbook.io/data-management-handbook/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://c-comp.gitbook.io/data-management-handbook/c-comp-data-roadmap.md).

# C-CoMP Data Roadmap

The C-CoMP Data Roadmap provides an overview of the data life cycle from initial study ideation through data deposition. The roadmap is divided into two phases. Phase 1 (blue circles in Figure 1) includes study ideation, planning, data generation, and curation (applicable to certain data streams). Phase 2 (green boxes) includes repository submissions and data analyses. A detailed description of each phase is provided below Figure 1.

<figure><img src="/files/Q5rGSMuMhN9UfaukoVFR" alt=""><figcaption><p><strong>Figure 1. The C-CoMP Data Roadmap gives an overview of the data life cycle from initial study ideation through data deposition.</strong></p></figcaption></figure>

### Phase 1

After study ideation (1. in the above figure), C-CoMP members should request internal C-CoMP Dataset numbers (CMP###) before data files are generated using [these instructions](/data-management-handbook/internal-c-comp-dataset-numbers.md) (2.).These numbers are incorporated into the file names to organize, group, and track files. Please follow the file naming conventions outlined [here](/data-management-handbook/file-naming-conventions/lc-ms-metabolomics.md). All data, even if they are used for method development and/or never shared or published, should be assigned internal C-CoMP dataset numbers.&#x20;

Once data are generated, curation should occur as necessary (3.). During this time, C-CoMP members should also decide if the data will contribute to C-CoMP publications and/or be shared across the Center (4.). If the answer is yes, raw and derived data as well as metadata should be submitted to repositories (move to Phase 2).

### Phase 2

There are two main branches in phase 2 that occur simultaneously: data analyses and writing (5A.) and data deposition (5B.). To best streamline phase 2, please follow the order proposed below:

#### Data Analysis and Writing

* **5A-1**. Intermediate, versioned research products should be shared during data analyses or, at the latest, prior to the onset of writing. There are several options for sharing initial research products. All research products intended for publication in manuscripts should be publicly available when writing begins. Products, code, and methods that are published online (e.g. GitHub, Zenodo, protocols.io) should be linked to the appropriate BCO-DMO dataset landing pages.
* **5A-2**. At the writing stage, most of the data and research products should be publicly available. Preprints and published manuscripts should also be linked to the appropriate BCO-DMO dataset landing pages. DOI’s are minted for BCO-DMO datasets upon study completion (i.e. submission of manuscripts to journals).

#### Data Deposition (5B)

* **5B-1.** Raw and derived data and associated metadata are submitted to data repositories simultaneously with data analysis. If domain data exists, proceed to 5B-2.
* **5B-2**. If any domain data is generated, submit raw and derived files (if applicable) as well as relevant, required metadata to domain repositories (e.g. Sequence Read Archive, MetaboLights, ProteomeXchange) first. Once accession numbers have been generated, this information will be linked to appropriate BCO-DMO dataset landing pages. This process could take a few hours to several months (depending on the repository).
* **5B-3.** Submit metadata and tabular data to BCO-DMO (detailed information here) and link associated datasets on domain repositories using the provided accession numbers. Dataset landing pages are created for all datasets and numerical models generated by C-CoMP.