We consider a network, which consists of time-series which are represented as vertices (nodes) and are connected by edges (links) which are estimated through statistical indices. To construct the edges, which represent the link-interactions among the time-series, four different statistical methods were applied: cross correlations, partial cross correlations, Granger causality indices and partial Granger causality indices. Then, statistical significance tests and false discovery rates (FDR) are applicable to the outputs of the four techniques for trimming the networks. The four methods differ in the following ways: cross correlation is a simple measure of similarity of two series (say, X and Y) as a function of the lag of one relative to the other. Partial cross correlations correct the possible delayed effects of an additional variable (say, Z) on the correlation between X and Y. Granger causality provides a much more stringent criterion for causation (or information flow) than simply observing high correlation with some lag-lead relationship. Finally, partial Granger causality addresses the problem of eliminating the confounding influence of exogenous inputs and latent variables. In summary, the measures of correlation that were applied do not consider information from previous time steps (i.e., non-lagged correlation techniques) in contrast to Granger causality, which does considered it. Starting with the most simple and least conservative method, the different approaches are applied to examine how network configurations are affected and which best represents the final ecological relations. It is currently understood, for instance, that many ecological systems exhibit feedback, and it is therefore expected that cross correlations as a symmetric measure may be unsuitable for identifying nontrivial lag-lead relationships.
Graph theoretic representation of cause-effect ecological network
To continue with the structure of graphs, based on multivariate time series analysis, it is convenient to introduce some graph theoretic matrix notations that mathematically formalize the networks proposed here. By definition, a graph G consists of a set of vertices-nodes (V(G) = {υ
1, υ
2, …, υ
n
}) and a set of edges-links (E(G) = {e
1, e
2, …, e
n
,}) in disjoint pair form, G = (V, E). Thus, the graph-network is an ordered pair [V (G), E (G)] [33, 34]. A graph is directed if the edge set is composed of ordered vertex (node) pairs and is undirected if the edge pair set is unordered. Any graph can be represented according to its adjacency matrix. The elements of the matrix indicate whether pairs of vertices are adjacent or not adjacent in the graph.
In the next sections, I intend to develop networks based on four different techniques (i.e., similarity measures) applied to compute the edges e
ij
between any two nodes v
i
and v
j
. Let E be an nxn matrix (i and j are indexes that go from 1 to n, and e
ij
is a single entry of the matrix E), called similarity-values matrix, that has as elements the similarity measure values (correlations, partial correlations, Granger causality, partial Granger Causality). Moreover, let E′, or p
value matrix, be a matrix with its elements being the significant probability values of multiple comparison test in respect to the similarity measure (correlations, partial correlations, Granger causality, partial Granger Causality) using either parametric or non-parametric comparisons, respectively. Thereafter, low probability values are considered to have more influence and to consist of the weighted versions of networks, which take into account the varying contributions of each causally significant interaction.
Lastly, an adjacency matrix, E″, is considered for each similarity measure, for which the constituents’ e
ij
are the outcomes of the final discovery rate:
$$e_{ij} = \left\{ {\begin{array}{*{20}c} 1 &\quad {if \,v_{i} v_{j} \in E} \\ 0 & \quad{ otherwise} \\ \end{array} } \right.$$
(1)
E″ is a binary-adjacency matrix and is used to record the most probable, non-trivial, numbers of edges joining pairs of vertices. Particularly, when the element e
ij
of the matrix is one there is an edge from vertex i to vertex j, and when it is zero there is no edge.
Each matrix (similarity-values, p-values and binary-adjacency) contains the information about the connectivity structure of the graph and is further used for the extraction of information about the characteristics of the investigated ecological networks. Four types of networks were constructed according to the four different methods (cross correlations, partial correlations, Granger causality and conditional Granger causality) of calculating the elements of the matrix. The number of nodes was the number of the time-series variables (including both; biotic and abiotic variables). The X, Y and Z represent populations of A. orana, A. lineatella and G. molesta, respectively, while Temp. and RH represent the two abiotic variables: mean temperature and relative humidity.
Standard measures of correlation for undirected graphs
Cross correlations
I look at the ecological time series, each of which is observed through successive seasons as a continuous univariate process: weekly counts are a vector that represents population realization available through observation. The Pearson correlation coefficient, r, is used as a standard similarity measure [35]:
$$cor\left( {X_{i} ;X_{j} } \right) = r_{{X_{i,} X_{j} }} = \frac{{s_{{X_{i,} X_{j} }} }}{{s_{{X_{i} }} s_{{X_{j} }} }}$$
(2)
where s
XiXj
stands for the sample covariance of the variables X
i
and X
j
and where s
Xi
, s
Xj
are the standard deviations of samples X
i
and X
j
, respectively. Substituting the estimates of the covariance and variance based on a sample gives the Pearson’s linear correlation coefficient, with the following estimate:
$$e_{ij(cor)} = r_{{X_{i,} X_{j} }} = \frac{{\mathop \sum \nolimits_{t = 1}^{n} ({\rm X}_{i} - \bar{\rm X}_{i} ({\rm X}_{j} - \bar{\rm X}_{j} ) }}{{s_{{X_{i} }} s_{{X_{j} }} }}$$
(3)
Here, \(\bar{\rm X}_{i}\) and \(\bar{\rm X}_{j}\) are the sample means for the first and second variables, respectively,\(s_{{X_{i} }}\) and \(s_{{X_{j} }}\) are the standard deviations for each variable, and n is the series length (here, all successive years in which populations are active are considered) starting from t = 1.
Partial correlations
The partial correlation network graphical representation, is defined as the collection of links between those nodes whose partial correlation (as defined below) is not zero. The linkage of these elements may be described in terms of an adjacency matrix that consists of a network with no direction (undirected) [31, 32]. To assess whether non-zero correlations are direct or indirect, causal measures should be considered as very useful because they measure the linear correlation between two variables after removing the effect of other variables and thus also finding spurious correlations and revealing hidden correlations. In particular, cross correlations are very suitable for detecting a type of dependence between pairs of variables (e.g., population X
i
on population X
j
, vice versa, or both). However, because we include abiotic variables as well (e.g., W
1
: temperature and/or W
2
: relative humidity) it is most probable that both X
i
and X
j
are dependent on another variable W
K
or even m other variables (nodes): W
K
= {W
k1, …, W
km
}, where K = {k
1, …, k
m
}. Thus, we consider the above cases as trivial correlations that most likely do not suggest a link (i, j). To maintain ties among only the ecological variables with direct dependence the following partial correlation measure was introduced:
$$e_{ij(paco)} = \left. {\rho_{ij} } \right|W_{k} = \frac{{\left. {\sigma_{ij} } \right|W_{k} }}{{\left. {\sigma_{ii} } \right|W_{k} \left. {\sigma_{jj} } \right|W_{k} }}$$
(4)
where \(\sigma_{{ij\left| {W_{k} } \right.}}\), \(\sigma_{{ii\left| {W_{k} } \right.}}\) and \(\sigma_{{jj\left| {W_{k} } \right.}}\) are components of a partial covariance matrix. The estimate of.ρ
ij
|W
k
is the sample partial correlation r
ij
|W
k
computationally derived as follows: First the residuals e
i
and e
j
of the multiple linear regression of X
i
on X
K
and X
j
on W
K
, respectively, are computed. Next, the correlation coefficient of e
i
and e
j
,.r
ij
|W
k
= r
ei,ej
is computed. Thus, if X
i
(i.e., population of species i) and X
j
(i.e., population of species j) are independent but, conditional to W
k
(i.e., weather variable) then ρ
ij
|W
k
should ideally be zero.
Dynamic measures of correlation for directed graphs
In contrast to the rules for known correlations and partial correlation [32], the Granger-Causality approach proposed by Granger [23, 24] was also applied. One important asset, compared to non-lagged cross correlations, is that it provides a stream of interaction (i.e., directed cause—effect associations).
Granger causality rules
The Granger causality measure, is based on the general concept of Norbert Wiener [36] that a causal influence should be manifest in improving the predictability of the driven process when the driving operation is followed. A measurable reduction in the unexplained variance of the response process (say, population Y
i
) as a result of inclusion of the causal (driving) process (say, population X
i
) in linear autoregressive modeling marks the existence of a causal influence from X
i
to Y
i
in the time domain n = 1, 2, … [37]. This method requires the estimation of multivariate vector autoregressions (MVAR) as follows:
$$Y_{t} = \mathop \sum \limits_{n = 1}^{p} a_{n} Y_{t - n} + \varepsilon_{r,t}$$
(5)
$$Y_{t} = \mathop \sum \limits_{n = 1}^{p} a_{n} Y_{t - n} + \mathop \sum \limits_{n = 1}^{p} \beta_{n} {\rm X}_{t - n} + \varepsilon_{u,t}$$
(6)
where ε
r,t
and ε
u,t
are uncorrelated at the same time disturbances-residuals and p is the maximum number of lagged observations included in the autoregressive model. In addition, α
n
and β
n
are coefficients of the model (i.e., the contribution of each lagged variable to the predicted values of Χ
i
and Υ
i
). The GCI is the pairwise linear Granger causality in the time domain and is defined as follows [18]:
$$e_{ij(GC)} = GCI_{X \to Y} = \ln \frac{{\sigma^{2} (\varepsilon_{r,t} )}}{{\sigma^{2} (\varepsilon_{u,t} )}}$$
(7)
where σ
2(ɛ
r,t
) is the unexplained variance (prediction error covariance) of Y
i
in its own autoregressive model (Eq. 5), whereas σ
2(ɛ
u,t
) is its unexplained variance when a joint model for both Y
i
and X
i
is constructed (Eq. 6).
To provide useful heuristics for understanding the empirical ecological time-series, it is necessary to go beyond the simple two-variable vector autoregressive models and to study more variables that incorporate aspects of a more complex system. Thus, if X
k
= [X
1
,X
2
,…, X
k
]T denotes the realizations of k ecological variables and T denotes matrix transposition then the technique can be further extended by using the following multivariate vector linear autoregressive process (MVAR):
$$X_{t} = \mathop \sum \limits_{n = 1}^{p} A_{n} Y_{t - n} + E_{t}$$
(8)
Here E
t
is a vector of multivariate zero-mean uncorrelated white noise process, A
i
is the k × k matrices of model coefficients and p is the model order, chosen, in this case, based on the Akaike information criteria (AIC) for MVAR processes. Significant interactions are based on the standard Granger causality index (GCI). It is expected that GCI: X
i
→ Y
i
> 0 when X
i
influences Y
i
, and that GCI: X
i
→ Y
i
= 0 when it does not. In practice, GCI: X
i
→ Y
i
is compared to a threshold value, and it was identified using parametric and non-parametric methods (i.e., using surrogate data).
Conditional Granger causality rules
One disadvantage of the GCI is that indirect partial effects of the other variables are not touched (e.g., examining whether X
1
Granger causes X
2
, by excluding the activities of all other variables X
3
,…, X
i
). This multivariate extension consists of the conditional Granger causality index (CGCI) and is extremely helpful because repeated pairwise analyses among multiple variables can sometimes give misleading results in terms of differentiating between direct and mediated causal influences [38].
As noted previously, the method requires the estimation of multivariate vector autoregressions (MVAR). If we consider X and Y as the driving and response systems, respectively, and conditioning on system Z, we have the following:
$$X_{t} = \mathop \sum \limits_{n = 1}^{p} a_{n} Y_{t - n} + \mathop \sum \limits_{n = 1}^{p} c_{n} {\rm Z}_{t - n} + \varepsilon_{r,t}$$
(9)
$$X_{t} = \mathop \sum \limits_{n = 1}^{p} a_{n} Y_{t - n} + \mathop \sum \limits_{n = 1}^{p} \beta_{n} {\rm X}_{t - n} + \mathop \sum \limits_{n = 1}^{p} c_{n} {\rm Z}_{t - n} + \varepsilon_{u,t}$$
(10)
The CGCI derived if we remove self-dependencies in respect to each variable in the Granger causality index for the two variables X and Y and conditioning on the third variable Z is as follows:
$$e_{ij(CGCI)} = CGCI_{X \to \left| Z \right.} = \ln \frac{{\sigma^{2} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\varepsilon }_{r,t} )}}{{\sigma^{2} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\varepsilon }_{u,t} )}}$$
(11)
where \(\sigma^{2} ( = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\varepsilon }_{r,t} )\) and \(\sigma^{2} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\varepsilon }_{u,t} )\) are variances of error estimators of the above restricted and unrestricted vector autoregressive models [39]. The CGCI was also checked because it is highly useful in bringing out the causal interactions among sets of nodes by eliminating common input artifacts [26].
To remove noise from data and to produce robust estimates of temporal autocorrelations between successive dynamics, data were subjected to prewhitening prior causality analysis to reject as much as possible of the temporal autocorrelated white noise [40]. This must be done if, as is usually the case, an input series is autocorrelated and to avoid having the direct cross-correlation function between the input and response series yield a misleading indication of the relation between the input and response series in an autoregressive moving average (ARIMA) modeling process. This task was performed using the standard MatLab prewhitening filter.
Significance of similarity measures and network trimming
The correlation coefficients between the variables form a correlation matrix that represents a weighted network. However, to detect only significant interactions and to trim the networks, it is important to apply a statistical hypothesis test according to a predefined probability that serves as the threshold level. This can be accomplished either by parametric or non-parametric methods, as indicated below.
Parametric test of correlation
To reduce the possibility that the observed correlations occurred by chance, a significance test should be performed. Here the null hypothesis H
0 that no correlations between any two ecological time series was examined, under the alternative hypothesis of the existence of correlations (two-tailed test). Thus if ρ
ij
is the true Pearson correlation coefficient, then the hypothesis test for significance is H
0:ρ
ij
= 0, with the alternative H
1:ρ
ij
≠ 0 and the estimate for ρ
ij
:r
ij
= s
ij
/s
i
s
j
. Thus, if we take for granted that each paired data set is normally distributed and stationary, then the paired variables X
i and X
j
yield.
$$(X_{i} ,X_{j} ) \cong N([\mu_{i} ,\mu_{j} ],[\sigma_{i}^{2} ,\sigma_{j}^{2} ],\rho_{ij} )$$
(12)
where μ and σ
2 are the standard notations for mean and variance, respectively. The t statistic for a sample n is
$$t = \frac{{r_{ij} \sqrt {n - 2} }}{{\sqrt {1 - r_{ij}^{2} } }},$$
(13)
where r
ij
is the correlation coefficient and H
0 was rejected if: |t| > t
N−2; a/2 for the α = 0.05 significance level.
For each pairwise tests the p values represented the probability of error that was involved in accepting the observed correlations of the ecological time series as valid. If the p values of the test were smaller than the α-threshold level (0.05), then the correlations were considered as significantly different from zero and a connection was traced in order to construct the ecological time series network.
Non-parametric testing of correlation
Although ecological time series in this work are treated as stationary, bootstrapped randomizations for 100 surrogates were carried out, and non-parametric comparisons were also performed for confirmative reasons. In particular, this method should be applied in cases of small sample size and absence of information concerning deviation from normality and stationarity. Here, we derive the null distribution of ρ
ij
from resampled pairs consistent with H
0: ρ
ij
= 0. Considering the original pair of ecological time-series (x
i
, x
j
), we generated B randomized pairs (x
*b
i
, x
*b
j
), b = 1, …, B. Although this random sample permutation destroys the time order, it uses the same distribution as the original time-series. At a next step, the r
*b
ij
on each pair (x
*b
i
, x
*b
j
) and the ensemble \(\left\{ {r_{ij}^{*b} } \right\}_{b = 1}^{B}\) that forms the empirical null distribution of r
ij
were computed. H
0
was rejected if the sample r
ij
was not in the distribution of \(\left\{ {r_{ij}^{*b} } \right\}_{b = 1}^{B}\). The null hypothesis was H
0:ρ
ij
|K = 0 and the alternative H
1:ρ
ij
|K ≠ 0. The analysis was carried out using the MatLab procedure; in all cases randomizations for 100 surrogates were performed.
Significance test for GCI and CGCI
If the variable X does not Granger cause Y then the contribution of X-lags in the unrestricted model (Eqs. 5 and 9) should be insignificant, and the model parameters should therefore be insignificant. The null hypothesis is H
0
: b
i
= 0, for all i = 1, …, p and the alternative is H
1: b
i
≠ 0, for any of i = 1, …, p. A rejection of the null hypothesis implies that there is Granger causality and this can be evaluated using the F-test (Snedocor-Fisher) [41]:
$$F = \frac{{(SSE^{r} - SSE^{u} )/p}}{{SSE^{u} /ndf}}.$$
(14)
Here, SEE is the sum of square errors, ndf = (n−p)−2p is the degrees of freedom, n−p is the number of equations, and 2p is the number of coefficients in the unrestricted model (Eqs. 6 and 10).
False discovery rates and true correlations
The false discovery rate (FDR) multiple testing procedure was applied to correct the false significant correlations of multiple comparisons, which can arise from incorrect rejections of false positives. The FDR was applied for conceptualizing the rate of type I errors in null hypothesis testing when conducting the multiple comparisons of the series and for further trimming the networks. The FDR was suggested by Benjamin and Hochberg [42] to address the problem with performing multiple simultaneous hypothesis tests. The FDR is a powerful concept by which one can retain the statistical power that would be lost to simultaneous comparisons made with Bonferroni-type procedures.
In particular, as the number of hypotheses increases, so does the probability of wrongly rejecting a null hypothesis because of random chance [42]. Therefore, to correct for multiple testing, as the same test applies for any i and j in {1, …, n}, one can use the correction of the false discovery rate (FDR) [43]. According to the procedure, first the p-values of m = n(n−1)/2 tests are set in ascending order p(1) ≤ p(2)… ≤ p(m). Next, the rejection of the null hypothesis of zero correlation, at the significance level α, is decided for all variable pairs for which the p-value of the corresponding test is less than p(k), where p(k) is the largest p-value for which p(k) ≤ k·α/m holds [44].