Workstream B: Standards

Standards and Ontologies for Integration, Analysis, and Exchange of Global Coherent Datasets
Lead: Jessie Tenenbaum, Duke Translational Medicine Institute
Participants: James Brenton, Kimberly Hartwell, Cameron Neylon, Jonathan Rees, Susanna-Assunta Sansone, Philippe Rocca-Serra, Jim Davies, Steve Harris

Introduction
A prerequisite for the success of the Sage Commons is a standardized approach to data organization and annotation.  The Standards and Ontology Project aims to identify:

  1. What metadata elements should be captured regarding experimental setting, the resulting data, and analyses
  2. What standards to use to capture those data elements
  3. Which data elements should draw from which ontologies
  4. What is the scope of data types to be included in each release

Activities before the Congress
For 2 specific data sets (BxH ApoE -/-, and TCGA glioblastoma):

  • Enumerate all data elements to be captured in a structure format (content)
  • Identify data types for each data element
  • Ideally leverage existing standardized common data elements (CDE’s), e.g. from caDSR
  • Identify ontologies to use where appropriate (semantics)
  • Use an existing tool (likely ISA-Creator) to annotate these two datasets as proof of concept
  • Clarify scope including experiment types (e.g. RNAi, drug screening, case-control) and modalities (e.g. gene expression, genotyping, genome sequencing, CNV, etc.)

Activities at the Congress

  • A presentation to communicate the issues this group is addressing and an overview of progress to date (without going into so much detail that even diehard geeks’ eyes glaze over)
  • Distribute information on how interested attendees can give feedback and get involved.

Activities after the Congress

  • Evaluate the annotation tool used for proof of concept annotation for its suitability as the tool of choice for contributors
  • Extend proof-of-concept annotations to the other ~4 datasets (?) available on Sage. Metadata elements for ovarian cancer are a key priority given imminent arrival of TCGA set and proposed prospective study profiling ovarian tissues before platinum-based chemotherapy in 1st relapse.
  • Integrate the annotation tool with a metadata repository from which to draw common data elements