Workstream A: Pilot
End-to-End Pilot Combining Data, Building and Querying Models
Leads: Andrew Kasarskis, Sage Bionetworks; Ilya Kupershmidt, NextBio
Introduction
Basic and clinical research communities around the world are generating increasing quantities of valuable experimental data using modern high-throughput molecular profiling technologies. Combining this data to perform comparative analysis and generate better disease models requires a common annotation and formats framework. The primary goal of the E2E pilot is to provide a set of coherently annotated datasets, clinical attributes and derived network models for distribution and use within the Sage community that will illustrate the benefits and identify the challenges of this approach. The experience of building and applying this pilot will also generate a set of recommendations for formats and annotation standards for data and network models. The project will promote the reuse of data, combined analysis and the sharing of derived network models. The E2E project will make the data sets and results below accessible to any registered Sage User.
Before the Congress we will provide:
- Links to primary gene expression, clinical trait, and genotype data, or the actual primary data if links are not feasible
- Analysis-ready expression, DNA Variation, and clinical data
- Gene-trait causality results
- At least 1 coexpression network per tissue
- At least 1 Bayesian network per tissue
- Narrative description of the in-life study
- Narrative description of the analysis steps
- Data dictionary for all files
- If privacy concerns inhibit release of full genotype data, we will release a redacted set of genotype data but the downstream results computed from the full data set. Also, some downstream results cannot be computed from the input data in a given data set.
- Datasets: BxH ApoE -/-; Human Liver Cohort (Deliver); TCGA Glioblastoma; Hong Kong University Hepatocellular Carcinoma & Adjacent normal tissue; British Columbia Cancer Agency Breast Cancer
- An end-to-end test of pilot system
- Allow any registered Sage User to complete a Key Driver analysis on a user-selected gene list and a network model selected from those available at the Sage Website.
- Summary Input for requirements and design of a future Sage Commons System
- Observed user behaviors and user requests
- Proposed extensions and enhancements from the team
- Open Issues: Areas in which the team needs guidance from Congress Delegates
At the Congress we will provide:
- Demonstration of the end-to-end test of the pilot system.
- Presentation of requirements and design of a future Sage Commons system
- Solicitation of additional feedback from users and other stakeholders (publishers, funders)
- Critique of draft requirements and use cases for a future Sage Commons system
- Communication of the contributions Congress delegates and their colleagues can make to aggregation of data, network models, analytical approaches and the Sage Commons system construction.