Bioinformatics and Functional Genomics Research Group
Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL)
Salamanca (SPAIN)

Katia P. LOPES, Francisco J. CAMPOS-LABORIE, Ricardo A. VIALLE, José Miguel ORTEGA, Javier DE LAS RIVAS

Evolutionary hallmarks of the human proteome: chasing the age and co-regulation of protein-coding genes

Background: The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution. Results: We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps (“hallmarks”), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules. Conclusions: PUnderstanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function.

Additional File 1: PDF file including 6 supplementary figures of the article.

Additional File 2: TABLE (.XLS) including the values of the gene pair-wise Spearman correlation and the cross-validation from the coexpression analysis, as well as the gene IDs to allow the reconstruction of a human coexpression network with 2,298 proteins and 20,005 interactions.

Additional File 3: Cytoscape file (format .cys, produced with Cytoscape version 3.1.0) with the complete coexpression network that is produced in this work and presented in Figure 5 of the main article.

Additional File 4: TABLE (.XLS) including the data and IDs of the 17,437 human protein-coding genes included in each of the 8 evolutionary stages.

Additional File 5: TABLE (.XLS) including the results of the functional enrichment analysis of the 17,437 human protein-coding genes included in each of the 8 evolutionary stages. The results for each of the 8 stages are presented in different spreadsheets.

[ARTICLE published in BMC Genomics 2016]