Bioinformatics and Functional Genomics Research Group
Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL)
Salamanca (SPAIN)

Campos-Laborie FJ, RisueƱo A, Ortiz-Estevez M, Roson-Burgo B, Droste C, Fontanillo C, Loos R, Sanchez-Santos JM, Trotter MW and De Las Rivas J

DECO: DEcomposing heterogeneous Cohorts by Omic data profiling

ABSTRACT: Patient diversity is one of the main challenges when dealing with large cohorts of clinical studies. Here we propose a method to analyse and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation. Our method, called DECO (DEcomposing heterogeneous Cohorts by Omic data profiling) finds and describes existing dependent relationships among biological features (genes) and samples (individuals) analysing large-scale omic-wide data. DECO identifies the best biomarkers related to specific phenotypic conditions and possible hidden factors. The method is based on a recursive heuristic algorithm that assigns marker features (i.e. genes, miRNAs, proteins or other biomarkers identified by the omic technique) to subsets of samples depending on their patterns. In this way, it identifies closely related states or subclasses within the studied cohort. The method explores differential signal finding: (i) variation among individuals related or not to a priori known phenotypic classes; (ii) possible errors in class label assignment; (iii) possible outliers. We demonstrate that DECO performs better than classical and current methods when it is applied to complex gene expression datasets from several cancer clinical cohorts. DECO identifies the specific omic signature of individuals, making it especially suited for deep and accurate patient stratification.

Additional File 1 - DECO R package

Additional File 2 - DECO R vignette and tutorial to use the method

[ARTICLE submitted for publication - September.2017]