A novel approach to determining null models and controls for co-expression networks

Abstract

Co-expression networks are a common and meaningful way of representing the associations between genes inferred from experimental data. In a graph theoretic setting, nodes describe genes and edges the respective correlations between gene expression levels in a set of samples. Genes associated by correlated expression levels are more likely to share functions, making coexpression networks a popular means of determining novel commonalities among sets of genes, as for disease candidate genes. However, networks built from experimental data can be highly biased due to small sample sizes and the presence of noise. A major challenge of bioinformatics and computational data analysis lies in identifying artifacts in biased data and separating them from biological meaningful information. As in any statistical assessment, choice of the null can be critical to ensure results are meaningful. Our work introduces two novel approaches for null-models to analyze and estimate the amount of biological and technical confounds in co-expression data. In a data-driven approach we develop a series of highly efficient tools to calculate functional properties in networks. These allow rapid controlled comparisons and analyses. Two of the most useful methods are: a function prediction algorithm which is fully vectorized, allowing network characterization across even thousands of functional groups to be accomplished in minutes in cross-validation and an analytic determination of the optimal prior to guess candidates genes across multiple functional sets. We demonstrate the methods by tracing the effects of selection biases arising in the transfer of function predictions for orthologous genes from humans to model organisms, focusing on autism candidate genes. The second approach describes a novel null-model by evaluating the transitivity of correlations in co-expression networks from a mathematical perspective. By analyzing the network topology of modules around so-called hub genes, we determine constraints on connectivity not typically accounted for when nulls are constructed through link permutation.

Date
Event
Genome Informatics
Location
Cold Spring Harbor, USA
Links