πŸ”’
C-CoMP Data Management Handbook
  • C-CoMP Data Management Handbook
  • Table of Contents
  • Executive Summary
  • Glossary of Terms
  • Overview
  • C-CoMP Data Roadmap
  • Internal C-CoMP Dataset Numbers
  • Sending samples to other labs
  • Data Group Definitions
  • Data Deposition Instructions
    • Metadata and Tabular Data Files
    • Raw and Derived Data Files
      • LC-MS Metabolomics
      • LC-MS Proteomics
      • NMR Metabolomics
      • Genomics/Sequencing Data
  • Numerical Models
  • Software & Tools
  • Data Products
  • File Naming Conventions
    • LC-MS Metabolomics
    • LC-MS Proteomics
    • NMR Metabolomics
    • Sequencing Files
    • Sequencing Products
    • Numerical Models & Products
    • Derived Files
    • Metadata & Tabular Data
  • File Naming and Data Deposition Example
  • Digital Coordinator Role
  • FAQ
  • Appendix
    • Quick Links
    • Spreadsheet Templates
Powered by GitBook
On this page
  1. File Naming Conventions

Sequencing Files

PreviousNMR MetabolomicsNextSequencing Products

Last updated 2 years ago

Please use this file naming template for raw sequences tool as a guide to create file names.

Field Definitions

Each number in the example above corresponds to a field in the file name. Fields are separated by β€˜_’ to enhance computer readability. Shortened column names used in the template above are provided in parentheses next to the appropriate field definition.

  1. Dataset Number (Dataset_No):

    • All C-CoMP datasets will be assigned an internal dataset number. Please request this number on the #dataset_number_requests slack channel following the instructions provided above.

    • Metadata about the dataset (including Dataset number, method type, and data storage location) will be recorded in the C-CoMP Data Catalog.

  2. Approach (Approach):

    • The kind of method that was used for this specific project X sample (see examples and abbreviations below)

  3. Sample type (Sample_Type):

    • Use this field to distinguish sample types. Sample type should fall into one of these categories: quality control (QC) or biological sample (SA). QC includes samples run as DNA extraction or sequencing controls to check for contamination during sample preparation.

  4. File number (File_No)

  5. Forward or Reverse Reads (Forward_Reverse):

    • Either the forward reads (R1) or reverse reads (R2) if applicable to the file type. Use β€˜noR’ if this is not applicable.

  6. Sequencing number (Seq_no):

    1. Used if there is more sequencing data for the same sample and data type. This field is only changed if the sample is a technical replicate. If the sample is a biological replicate or from a separate extraction process, the sample is assigned a different sample_ID. Default number is 001.

  7. File-type extension

Approach Abbreviations:

  • WMX - Whole Metagenomic Sequencing (environmental metagenomics)

  • WQX - Whole Genome Sequencing

  • AMP - Amplicon Sequencing (e. g. 16S rRNA)

  • TXX - Transcriptomics

  • MTX - Environmental Metatranscriptomics

Sample Type Abbreviations:

  • QC - Quality Control

  • SA - Biological Sample

27KB
CCoMP_Sequence_File_Names.xlsx
This file name refers to the forward reads (R1) sequenced from a biological sample with the file number 8 that were collected using whole metagenomic sequencing in dataset CMP002.