Before running CoGA, you must set the execution parameters available on
the left sidebar. Below, we detail each differential
network analysis parameter.
Classes (conditions) being compared
CoGA analyses compare only two phenotypes. If your dataset has more than two
phenotypes, then you must select one pair of phenotypes from a list of
all possible pairs.
The selected pair will be used by all CoGA analyses.
Gene sets size range
CoGA performs tests for each gene set of a collection of gene sets. To
test only a subcollection of sets, you can filter the gene groups according
to their sizes by setting the "Minimum gene set size" and
"Maximum gene set size" parameters.
The minimum gene set size allowed is 2. However, we recommend
to test groups with at least 20 genes.
Testing large gene sets can spend much time. In general,
it is feasible to set 1000 or some hundreds of genes as the maximum gene set
size. However this number may vary according to the user's machine specification.
Method for network inference
The network links are inferred according to a measure of association between the gene
expression levels. CoGA provides three classical association measures:
-
Pearson: Pearson's correlation coefficient. It measures the linear
dependence between two variables. For the statistical test, we use the
Hmisc package.
-
Spearman: Spearman's correlation coefficient. It measures the monotonic
dependence between two variables. For the statistical test, we use the
Hmisc package.
-
Kendall: Kendall's Tau coefficient. It measures the monotonic
dependence between two variables. For the statistical test, we use the
psych package.
The correlation coefficient or p-value obtained by one of the methods mentioned
above are used to set an association degree for each link of the network.
The following options are available to measure the association degrees:
- Absolute correlation: the absolute value of the correlation coefficient
- 1 - p-value: One minus the p-value of the test for dependence between
two gene products. If the p-value is small, the expression levels are
tightly associated.
- 1 - q-value: One minus the adjusted p-value of the test for dependence
between two gene products. The p-value is adjusted by
the False Discovery Rate (Benjamini and
Hochberg, 1995) method for multiple testing.
Network type
You can choose between unweighted and weighted networks:
-
Unweighted: Graphs where all the edges are weighted by one.
You must choose a threshold for the edges selection. Only
the edges that connect genes with an association degree greater than
the threshold will remain in the graph.
-
Weighted: Weighted networks are full graphs where each edge has
a weight. The weight of an edge is defined as the association degree
between the two gene products that are connected by it.
Method for gene networks comparison
CoGA compares the gene co-expression networks between two
phenotypes for each gene set.
Below, we describe the methods available for comparing unweighted networks:
-
Spectral distribution test: The spectrum of an undirected
graph is the set of eigenvalues of its adjacency matrix. The spectrum distribution describes many topological properties of a graph, such as
the number of walks, diameter, and cliques. The spectral distribution
test is based on the Jensen-Shannon divergence between spectral
distributions (Takahashi et al., 2012). It can be used to test if
two graphs were generated by the same model.
-
Spectral entropy test: It uses the absolute difference between spectral entropies (Takahashi et al., 2012) to measure the difference in
the graph topological organization complexity.
-
Degree distribution test: The degree of a node is the number of edges
that connect to it. The degree distribution test is based on the
Jensen-Shannon divergence between the degree distributions. CoGA uses the
igraph package implementation of the node degree.
-
Degree centrality test: The degree centrality test
is based on the Euclidian distance between the degree centralities
of the two networks adjusted by the number of vertices.
-
Betweenness centrality test: The betweenness centrality of a node is the number of shortest paths going through it (Freeman, 1979).
The betweenness centrality test is based on the Euclidian distance between the
betweenness centralities of
the two networks adjusted by the number of vertices. CoGA uses the igraph package
implementation.
-
Closeness centrality test: The closeness centrality of a node is the
inverse of the average length of the shortest paths between it and all
the other vertices in the graph (Freeman, 1979). The closeness
centrality test is based on the Euclidian distance between the
closeness centralities of the two networks adjusted by the number of
vertices. CoGA uses the igraph package
implementation.
-
Eigenvector centrality test: The eigenvector centrality of a node
vi is the ith value of the first eigenvector
of the graph adjacency matrix (Bonacich, 1987). The eigenvector
centrality test is based on the Euclidian distance between
eigenvector centralities of the two networks adjusted by the number of
vertices. CoGA uses the igraph package
implementation.
-
Clustering coefficient test:
The local clustering coefficient of a node is the number of edges between the
vertices within its neighborhood divided by the number of edges that could
exist among them (Watts and Strogatz, 1998). The clustering coefficient test
is based on the Euclidian distance between the
local clustering coefficients of the two networks adjusted by the number
of vertices. CoGA uses the igraph package
implementation.
-
Shortest path length test: The shortest path length test
is based on the absolute difference between the averages of all the shortest path
lengths for all pair of nodes vi and
vj with i ≠ j. CoGA uses the igraph package
implementation.
CoGA includes generalizations of some of the statistics described above to
weighted undirected graphs. Let G be a weighted undirected graph.
We define the weighted adjacency matrix of G to be the
matrix W = (w)ij, such that wij is the
weight of the edge that connects the vertices vi and
vj.
In this context, 0 ≤ wij ≤ 1 and G is a full graph.
Below, we describe the methods available for comparing weighted networks:
-
Spectral distribution test: Replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the spectral distribution
test for unweighted networks.
-
Spectral entropy test: Replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the spectral entropy
test for unweighted networks.
-
Degree distribution test: CoGA generalizes the degree of a node
to the sum of the weights of the edges that connect to it (Barrat, 2004).
The software uses the igraph implementation of the node strength.
It replaces the usual node degree by the weighted degree, and
then computes the degree distribution test for unweighted networks.
-
Degree centrality test: Replaces the usual node degree by the weighted
degree, and then computes the degree centrality test for
unweighted networks.
-
Eigenvector centrality test: replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the eigenvector centrality
test for unweighted networks (Newton, 2004).
-
Clustering coefficient test: replaces the local clustering coefficient
of a node by the sum of the weights of the edges between the vertices within its neighborhood divided by the number of edges that could exist among
them (Lopez-Fernandez et al, 2004). Then it performs the
clustering coefficient test for unweighted networks.
For the "Spectral distribution test", the "Spectral entropy test", and
the "Degree distribution test" methods, you must select a criterion to
define the bandwidth for the probability density function estimation. The
available methods for computing the bandwidth are:
-
Sturges: the bandwidth is defined as (max(x) - min(x))/nbins (Sturges, 1926), where
x is the graph spectrum (for the tests based on the spectral density)
or the node degrees (for the degree distribution test), and
nbins=⌈log2(nV)
+ 1⌉,
with nV denoting the number of genes.
-
Silverman: the bandwidth is defined as 0.9min{sd(x), IQR(x)/1.34}
nV-0.2
(Silverman, 1986), unless the quartiles coincide,
where nV is the number of genes, sd(x) is the standard deviation of x, and IQR is the interquantile
range of x, with x denoting the graph spectrum (for the tests based on the spectral density)
or the node degrees (for the degree distribution test). If the
graph is empty, it is defined as 0.9nV-0.2.
CoGA uses the R 'density' function from the base package for estimating the
probability density function.
Permutation test settings
To compute a p-value for the differential network analysis, CoGA performs
a permutation based test, which generates N random permutations of
the sample labels.
The minimum possible p-value is 1⁄N + 1.
Therefore, the choice of N depends on the required significance level
of the test. You can set the N parameter on the
"Enter the number of label permutations" option.
To perform the same label permutations for all gene sets, you can set a seed
to generate the random permutations on the "Enter a seed to generate random
permutations" option.
Running the analysis
After loading the dataset and the execution parameters, click on the "Start
analysis" button. A progress bar will be shown on the right top corner of
the page:
The results and other execution messages are shown on the
"Analysis results" section.