Bioinformatics and Functional Genomics Research Group
Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL)
Salamanca (SPAIN)

Campos-Laborie FJ, RisueƱo A, Ortiz-Estevez M, Roson-Burgo B, Droste C, Fontanillo C, Loos R, Sanchez-Santos JM, Trotter MW and De Las Rivas J

DECO: DEcomposing heterogeneous Cohorts using Omic data profiling

ABSTRACT: Patient diversity is one of the main challenges when dealing with large cohorts of clinical studies. Here we propose a method to analyse and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation. Our method, called DECO (DEcomposing heterogeneous Cohorts using Omic data profiling) finds and describes existing dependent relationships among biological features (genes) and samples (individuals) analysing large-scale omic-wide data. DECO identifies the best biomarkers related to specific phenotypic conditions and possible hidden factors. The method is based on a recursive heuristic algorithm that assigns marker features (i.e. genes, miRNAs, proteins or other biomarkers identified by the omic technique) to subsets of samples depending on their patterns. In this way, it identifies closely related states or subclasses within the studied cohorts.

The method performs a recursive exploration of differential signal changes between samples, finding variables assigned to:
(i) the main classes or groups of samples that are in the studied cohorts;
(ii) significant variation or alteration among certain individuals (related or not to an a-priori known class);
(iii) possible errors in the class or the label given to certain samples;
(iv) sample outliers (i.e. individuals that behave in a different way to the main groups and have specific markers).

We demonstrate that DECO performs better than classical and current methods when it is applied to complex gene expression datasets from several cancer clinical cohorts. DECO identifies the specific omic signature of individuals, making it especially suited to perform deep and accurate patient stratification in large-scale clinical studies.

DECO algorithm: available as R package in Bioconductor:

[ARTICLE submitted for publication in Bioinformatics - January.2019]