- Methodology article
- Open access
- Published:

# Measuring specialization in species interaction networks

*BMC Ecology*
**volume 6**, Article number: 9 (2006)

## Abstract

### Background

Network analyses of plant-animal interactions hold valuable biological information. They are often used to quantify the degree of specialization between partners, but usually based on qualitative indices such as 'connectance' or number of links. These measures ignore interaction frequencies or sampling intensity, and strongly depend on network size.

### Results

Here we introduce two quantitative indices using interaction frequencies to describe the degree of specialization, based on information theory. The first measure (*d*') describes the degree of interaction specialization at the species level, while the second measure (*H*_{2}') characterizes the degree of specialization or partitioning among two parties in the entire network. Both indices are mathematically related and derived from Shannon entropy. The species-level index *d*' can be used to analyze variation within networks, while *H*_{2}' as a network-level index is useful for comparisons across different interaction webs. Analyses of two published pollinator networks identified differences and features that have not been detected with previous approaches. For instance, plants and pollinators within a network differed in their average degree of specialization (weighted mean *d*'), and the correlation between specialization of pollinators and their relative abundance also differed between the webs. Rarefied sampling effort in both networks and null model simulations suggest that *H*_{2}' is not affected by network size or sampling intensity.

### Conclusion

Quantitative analyses reflect properties of interaction networks more appropriately than previous qualitative attempts, and are robust against variation in sampling intensity, network size and symmetry. These measures will improve our understanding of patterns of specialization within and across networks from a broad spectrum of biological interactions.

## Background

The degree of specialization of plants or animals has been studied and debated extensively, and a continuum from complete specialization to full generalization can be found in various systems [1–6]. In general, two levels of specialization measures may be distinguished: first, the characterization of focal species and, second, the degree of specialization of an entire interaction network, representing an assemblage of species and their interaction partners (e.g. food webs, mutualistic networks, predator-prey relationships). When interactions are considered as ecological niche, the first level describes the niche breadth of a species and the second level the degree of niche partitioning across species. While the species level is more straightforward in its biological interpretation, analyses at the network level can be useful for comparisons across different types of networks. Such analyses have been performed to compare plant-pollinator webs versus plant-seed disperser webs [4, 5], different plant-pollinator networks along geographic gradients [1, 7, 8], or food webs of variable size [9, 10]. Entire network analyses are also used to study patterns on a community level such as coevolutionary adaptations [3], ecosystem stability or resilience [11–14].

### Quantifying specialization at the species level

Specialization or generalization of interactions are most commonly characterized as the number of partners (or 'links'), e.g. the number of pollinator species visiting a flowering plant species or the number of food plant families a herbivore feeds upon. In this qualitative approach, interactions between a consumer and a resource species are only scored in a binary way as 'present' or 'absent', ignoring any distinction between strong interactions and weak or occasional ones. For example, binary representation of interactions do not distinguish a scenario where 99% of the individuals of a herbivore species feed on a single plant species only, but occasionally an individual is found on another plant, from a different scenario where a herbivore regularly feeds on both food plants. The problem is analogous to the measurement of biodiversity either as a crude species richness versus as a more elaborate diversity index including relative abundances [15]. Several approaches have thus been used to directly include variation in interaction frequencies (i.e., their evenness) in characterizing the diversity of partners, e.g. Simpson's diversity index for pollinators [16, 17] or Lloyd's index for host specificity [18]. Alternatively, other studies indirectly controlled for abundance or sampling intensity using rarefaction methods [13, 19]. Correspondingly, Bersier and coworkers [20] have suggested to quantify the diversity of biomass flows in food webs using a Shannon diversity measure. Niche breadth theory provides several additional indices that include some measure of resource frequency or resource use intensity [21], which can be viewed in analogy to 'partner diversity' in the context of association networks. However, Hurlbert [22] emphasized that not only proportional utilization, but also the proportional availability of each niche should be taken into account. A species that uses all niches in the same proportion as their availability in the environment should be considered more opportunistic than a species that uses rare resources disproportionately more. If variation in resource availability is large, diversity-based measures that ignore this availability may be highly misleading [22, 23]. Several niche breadth measures thus combine proportional resource utilization with proportional resource availability [22–24]. These concepts have been rarely applied in the context of species interaction networks, e.g. plant-pollinator webs where binary data are more common than quantitative webs.

### Quantifying specialization at the community level

The measurement used most commonly to characterize community-wide specialization is the 'connectance' index (*C*) [1, 4, 8–10, 25–27]. *C* is defined as the proportion of the actually observed interactions to all possible interactions. Consider a contingency table showing the association between two parties, with *r* rows (e.g., plant species) and *c* columns (e.g., pollinators). Connectance is defined as *C* = *I*/(*r*·*c*), with *I* being the total number of non-zero elements in the matrix. Therefore, like the number of partners or links (*L*) described above, *C* uses only binary information and ignores interaction strength. *C* is directly related to the mean number of links (\overline{L}) of plant species or pollinator species as *C* = \overline{L}_{
plants
}/*c* = \overline{L}_{
poll
}/*r*.

This measure, \overline{L}, has also been used to compare networks [1, 3, 7, 8, 28]. Recently, it has been suggested to use \overline{L} instead of *C* to characterize networks [29]. However, note that comparisons across networks of different size (number of species) are problematic, since \overline{L}, unlike *C*, is not scaled according to the number of available partners (see also [2, 10]). \overline{L} in a small network may represent a larger proportion of available partners compared to the same value of \overline{L} in a large network.

Analyses based on binary data – both at the species and the community level – have obvious shortcomings, since they are highly dependent on sampling effort, decisions which species to include or not, and the size of investigated networks. Several authors thus emphasized the need to move beyond binary representations of interactions to quantitative measures involving some measure of interaction strength [4, 20, 27, 29–32]. A way to at least partly overcome these deficiencies is to cut off all rare species or weak interactions below a frequency threshold [3, 9, 33, 34] or to control for sampling effort in null models [7, 8, 13, 19, 25, 35]. However, for interaction webs where a more detailed information is available, simplification to binary data as in *C* or \overline{L} remains unsatisfactory. Conveniently, the observed interaction frequency may represent a meaningful surrogate for interaction strength, at least in pollination and seed-dispersal systems as shown by Vázquez et al. [30] (see also [16]). Incorporating interaction frequency or even a direct measure of interaction strength in a network measure of specialization would thus provide an important progress frequently called for.

A severe additional problem of connectance is that its lower and upper constraints are not scale-invariant [25], which limits its use for comparisons across networks. The minimum possible value (*C*_{min}) to maintain at least one link per species declines in a hyperbolic function with the number of interacting species, since *C*_{min} = max(*r*, *c*)/(*r*·*c*), and an upper limit (*C*_{max}) may be constrained by, or a function of, total sampling effort. Across networks, *C* decays strongly with network size, which has been debated in detail in the context of food web analysis [9, 10, 26, 27, 36, 37]. The strong relationship between *C* and network size generates a problem for disentangling any biologically meaningful effect from this mathematically inherent scale dependence. For instance, network comparisons may focus on residual variation in *C* after an average effect of network size has been controlled for [1, 4], or *C* could be rescaled to account for this size effect (see [25, 36]). For natural networks of similar size, the range of actual *C* values is typically very narrow [4], thus other structural forces may be poorly detectable.

The objective of this paper is to develop and discuss specialization measures that are based on frequency data and thus account for sampling intensity, and that overcome the problem of scale dependence. We then test these approaches by evaluating the effect of sampling effort and scale dependence on a published natural pollination network, and on randomly generated associations as a null model. We differentiate between species-level measures of specialization, useful to investigate variability among species within a web, and a single network-wide measure that can be used for comparisons across networks.

## Results

### Patterns in two pollinator networks

Two selected plant-pollinator networks (British meadows studied by Memmott [32], Argentinean forests studied by Vázquez and Simberloff [33]) differ markedly in their degree of specialization when quantitative analyses are applied. The qualitative network index, connectance, is similar in both interaction webs (British web: *C* = 0.15, Argentinean web: *C* = 0.13). However, frequencies of pollinator visits are much more evenly distributed in the British community than in the Argentinean example. In the British web, the interaction between a dipteran species and *Leontodon hispidus* was the most frequent one, representing 6% of the total 2183 interactions observed. In the Argentinean network, visits of *Aristotelia chilensis* by a colletid bee species represented 20% of the 5285 interactions alone. Interactions between the top five plant and top five pollinator species made up 44% of the interactions in the British web, but 74% in the Argentinean web. This difference in the heterogeneity of interaction frequencies is not evident in measures based on binary information such as number of links (*L*) or connectance (*C*). In contrast, the degree of specialization shown by the frequency-based index *H*_{2}' (standardized two-dimensional Shannon entropy, see *Methods: Network-level index*) is much lower in the British community (*H*_{2}' = 0.24) compared to the Argentinean community (*H*_{2}' = 0.63).

The variation of species-level specialization measures (standardized Kullback-Leibler distance, *d*') holds valuable information for the structural properties of a network (see *Methods: Species-level index*). The British pollination web is dominated by highly generalized pollinators (low *d*', both in terms of individuals as well as species), while putative specialists are represented by very few individuals and species (Fig. 1A). In contrast, most pollinators in the Argentinean web are moderately generalized to specialized, with the second highest level of specialization found in the most common species (Fig. 1B). Consequently, the weighted mean degree of specialization is much lower in the former web (<*d*'_{
poll
}> = 0.16) than in the latter (<*d*'_{
poll
}> = 0.54). The relationship between specialization of species *i* (*d*'_{
i
}) and its interaction frequency (*A*_{
i
}) across the pollinator species differs between the two webs. In the British web, *d*'_{
i
}and *A*_{
i
}were not correlated significantly (Spearman's *r*_{
s
}= -0.08, *p* = 0.46), while a highly positive correlation was found in the Argentinean web (*r*_{
s
}= 0.65, *p* < 0.0001). Note that designation of any specialization index to a species *i* that is only represented by a single individual may be critical. However, significances in the above correlations remain unaffected when pollinators with one single interaction are excluded. From the plants' point of view, the species in Memmott's web are also more generalized in terms of their pollinator spectrum (Fig. 1C) than the plants studied by Vázquez and Simberloff (Fig. 1D). The respective weighted means are <*d*'_{
plants
}> = 0.27 and <*d*'_{
plants
}> = 0.53. No significant correlation was found between the plants' frequency and specialization in either web (both *p* ≥ 0.16). Interestingly, plants were on average more specialized than pollinators in the British web (<*d*'_{
plants
}> > <*d*'_{
poll.
}>), but not in the Argentinean web. This distinction is not found when only the weighted mean number of links (*L*) are examined, since <*L*_{
plants
}> is much greater than <*L*_{
poll.
}> in both networks. The difference in <*L*> may be driven by the highly asymmetrical matrix architecture in both webs, where the number of pollinator species greatly exceeds the number of plant species. The unweighted mean \overline{L} is even directly linked to the matrix architecture (i.e., number of rows and columns, *r* and *c*) by a constant (connectance *C*), since \overline{L}_{
r
}= *c*·*C* and \overline{L}_{
c
}= *r*·*C*. In contrast, the matrix asymmetry does not affect *d*' (see also below, *Null model patterns*).

### Simulation of sampling effort

In order to test whether specialization estimates are dependent on sampling and scale effects, we simulated a decreased sampling intensity in both networks using rarefaction (see *Methods: Simulation of sampling effort and matrix architecture*). In both networks, *H*_{2}' is robust and already very well estimated by a small fraction of the interactions sampled (Fig. 2). The coefficient of variance of *H*_{2}' remains below 5% from about half of the total number of visits onwards in the British web and even at one-tenth of the total sampling effort of the Argentinean web. The estimation of connectance (*C*) is also relatively stable at least in the Argentinean web, although it shows a positive trend across sampling effort in the British web (Fig. 2). These findings suggest that network-wide measures of specialization, particularly *H*_{2}', do not necessarily require a very large or even complete association matrix, but can also be very well estimated from a smaller representative subset as long as there is no systematic sampling bias.

### Null model patterns

The degree of specialization can be further characterized by comparison with a null model. The null model used here is that each species has a fixed total number of interactions (given by the observed association matrix), but interactions are assigned randomly. In the above pollinator networks, random associations yield a specialization index *H*_{2}' that remains close to zero for almost the entire range of sampling intensity, while connectance (*C*) shows a positive trend over the total number of interactions (*m*) (Fig. 2). Therefore, *H*_{2}' derived from real networks may typically be clearly distinguished from this null model, while the comparison of *C* is complicated by scale dependence and the relatively large values yielded by the null model.

Simulations of artificially generated random associations (see *Methods: Simulation of sampling effort and matrix architecture*) confirm that the network-level specialization index *H*_{2}' is largely unaffected by network size (Fig. 3A), network architecture (Fig. 3B) or total number of interactions (*m*) for a fixed matrix size (Fig. 3C). For random associations as shown here, *H*_{2}' is usually close to zero. Connectance values (*C*) of random matrices show the known hyperbolic function over the number of associated species (Fig. 3A), changes with matrix asymmetry (Fig. 3B) and increase strongly with increasing *m* (Fig. 3C). For specialization measures at the species level, the average number of links per species (\overline{L}) increases strongly with network size, number of available partners, and *m* (Fig. 3). While other niche breadth measures may also show some variation across different network scales (not shown), the weighted mean Kullback-Leibler distance <*d*'> is poorly affected by network size, network asymmetry, and number of interactions (Fig. 3). Both *H*_{2}' and *d*' may thus be appropriate for comparisons across matrices of different scale.

## Discussion

### Properties of specialization measures

The suggested indices, *d*' and *H*_{2}', quantify the degree of specialisation of elements within an interaction network and of the entire network, respectively. While the number of links (*L*) and connectance (*C*) represent species-level and community-level measures of interactions based on binary data, respectively, *d*' and *H*_{2}' represent corresponding measures for frequency-based data. The need to include information on interaction strength or interaction frequency into network analyses has been announced by various authors [4, 20, 27, 30, 31, 38]. Parallel to earlier advances in diversity measures compared to species richness, quantitative network measures account for the heterogeneity in link strength rather than assigning equal weights to every link. Moreover, we have shown that *d*' and *H*_{2}' are largely robust against variation in matrix size, shape, and sampling effort. In several cases, *C* may be strongly affected by sampling effort [25, 27], while *H*_{2}' remained largely unchanged in simulations of random associations over a range of network sizes, variable network asymmetries, and number of interactions. This scale invariance suggests that both *d*' and *H*_{2}' can be used directly for comparisons across different networks, while comparisons of *L* and *C* are more problematic [1, 35].

Qualitative methods like the indices suggested here also allow a more detailed analysis of interaction patterns within and across networks. Fruitful areas include comparisons of networks across different interaction types [4], biogeographical gradients [1], biodiversity and land use gradients [13], robustness of networks against extinction risks [39], asymmetries between plants and animals [38], and relationships between specialisation and abundance [35]. While a comparison of the average number of partners between plants versus animals is solely dependent on the matrix architecture (*i.e*., the number of rows *r* versus columns *c*, since \overline{L}_{
plants
}= *c*·*C* and \overline{L}_{
poll
}= *r*·*C*), this limitation does not apply to *d*'. In the two selected pollinator webs, plants are either similarly or more specialised than pollinators in regard to weighted mean *d*'. This allows an scale-independent evaluation of asymmetries in the degree of specialization between partners (see also [38]). Moreover, Vázquez and Aizen [35] noted that the number of links of a species (*L*_{
i
}) is strongly positively correlated with its overall frequency (*A*_{
i
}) in five pollination networks including the datasets analyzed above. They argued that this apparent higher generalization of common plants and common pollinators may be largely explained by null models, calling for an improved measurement of specialization. Our results for the correlation between *d*'_{
i
}and *A*_{
i
}in two pollinator webs suggest that the relationship between specialization and abundance may be more variable, and even positive as in the Argentinean network.

### Caveats

Some problems apply to any measure of network analyses including the proposed indices. Measures of specialization mostly ignore phylogenetic relationships or ecological similarity within an association matrix. For example, a plant species that is pollinated by multiple moth species may be unsuitably regarded as more generalized than a plant pollinated by few insect species comprising several different orders [40]. In addition, the fact that herbivores are commonly specialized on host plant families rather than species may skew network patterns if not carefully accounted for. A first approach to investigate such effects may be to compare the level of specialization after a stepwise reduction of the matrix by pooling species to higher taxonomic units, such as genera, families, and orders. For known phylogenies, more advanced techniques for analyses with a particular evolutionary focus are available [41–43]. Another deficiency may be that species or their partners are all given the same individual 'weight' in the analyses, whether they may be small bees or large bats visiting a small herb with little nectar or a mass flowering tree. Null models as in the calculation for both *C* and *H*_{2}.' imply that all individuals can be shifted around between resources in the same way, irrespective of their size or non-fitting parameters. The role of 'forbidden links' as constraints to network analyses has been discussed elsewhere [44, 45]. Similarly, calculations of *d*' or other niche breadth measures are based on the implicit assumption that each species adjusts its interactions according to the availability of partners (niches), irrespective of morphological or behavioral constraints. Moreover, if data are collected from a large heterogeneous habitat or over a prolonged time period, calculations of the degree of specialization may be severely constrained by the spatiotemporal overlap or non-overlap between partners for other reasons than resource preferences, e.g. when not all species are able to reach all sites in the same way, or when some resources and consumers have asynchronous phenologies. Consequently, network analyses as suggested here will be most useful to study resource-consumer partitioning within a short time frame and limited spatial scale.

For both indices *d*' and *H*_{2}', we proposed above to use the total number of interactions for each species as a measure of partner availability (*q*_{
j
}) and as constraint for standardization (fixed row and column totals). It may be debated whether independent measures of plant and animal abundances could be more appropriate than using interaction frequency data as such. However, despite the fact that such abundance data barely exist for most networks, note that the actual number of interactions often more suitably reflects resource availability and consumer activity than an independent measure of species abundance. For instance, a flower of one species may have a much higher nectar production than another and consequently receive a higher number of visitors, while the local abundance of the plant species does not reflect such differences in resource quality and/or quantity. Both *d*' and *H*_{2}' thus focus on the actual partitioning between the interacting species. In studies where detailed knowledge or theoretical assumptions about resources (availability and quality) or consumers (activity density and consumption rate) are available or under experimental control, such data may be incorporated into the analysis (defining *q*_{
j
}and constraints) instead of interaction frequencies. The constraint of fixed row and column totals has been debated elsewhere in the context of species co-occurrence patterns, where it was found to be most appropriate in null model comparisons, although critics have argued earlier that these marginals themselves may already reflect competitive interactions ([46] and references therein). Any approach to compare networks based on fixed marginals for standardization will fail to detect potentially meaningful patterns displayed by these architectural features, namely the number of resource and consumer species and the heterogeneity of total interaction frequencies. This network architecture may already be shaped by past competitive interactions or indicate fundamental constraints, a largely unexplored hypothesis that merits additional investigations.

It should also be emphasized that analyses of frequency data may be susceptible for pseudoreplication of repeated associations of the same individuals or close associations derived from a single dispersal event (e.g. a social insect colony, aggregating individuals, multiple offspring from a single egg cluster, or monospecific plant clusters). These may lead to an overestimation of specialization. To be more meaningful on a population level, frequency analyses should thus be based on spatially independent association replicates. Note that all species-wise specialization measures such as *d*' are sensitive to the behavior of the other species. Any systematic sampling bias (e.g. a taxonomic focus within a guild) will therefore affect the conclusions of comparisons within or across networks.

## Conclusion

In accordance with previous calls [4, 20, 27, 30, 31, 38], we suggest that the explicit inclusion of frequency data reflects an important step forward in network analyses, as too many assumptions are implicit in any measure based on binary representation. Most notably, connectance and 'number of partners' imply an equal availability of all partners – an unlikely scenario. Qualitative indices are not robust against sampling effort. On the contrary, the proposed quantitative measures based on interaction frequencies explicitly account for this source of variation. Our study suggests that *d*' and *H*_{2}' represent scale-independent and meaningful indices to characterize specialization on the level of single species and the entire network, respectively. These novel indices allow us to investigate patterns within and across networks that have not been detected with qualitative measures such as correlations with species frequencies, network size and asymmetries in specialization between partners. Recently, Bascompte et al. [38] showed that the incorporation of frequency data may unveil pervasive asymmetries within networks. Particularly since Vázquez et al. [30] demonstrated that interaction frequencies in plant-pollinator and plant-seed disperser systems often correlate with the magnitude of mutualistic services for the plant (although variation in pollinator effectiveness can be important, see [47]), an increased collection of frequency data and appropriate quantitative analyses would greatly benefit future network studies.

## Methods

### Species-level index

As species-level measure of 'partner diversity', we propose the Kullback-Leibler distance (or Kullback-Leibler divergence, relative entropy) in a standardized form (*d*'). Coming from information theory, this index quantifies the difference between two probability distributions [48]. While the standardized Hurlbert's and Smith's measure of niche breadth could be used alternatively [21, 22, 24], *d*' has some advantages in the context of networks. While all three indices regard an exclusive pairing between two species as high degree of specialization as long as interactions between the two partners are infrequent, Hurlbert's and Smith's indices show a undesired trend towards full generalization when the number of interactions between the two partners increase, although this should be considered a stronger indication of specialization (see below, *Properties of alternative niche breadth measures*). The interaction between two parties is commonly displayed in a *r* × *c* contingency table, with *r* rows representing one party such as flowering plant species, and *c* columns representing the other party such as pollinator species. In each cell, the frequency of interaction between plant species *i* and pollinator species *j* (or another useful measure of interaction strength) is given as *a*_{
ij
}, (Table 1).

Instead of frequencies (*a*_{
ij
}), each interaction can be assigned a proportion of the total (*m*) as

{p}_{ij}={a}_{ij}/m, where {\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}{p}_{ij}}}=1.

Let *p*'_{
ij
}be the proportion of the number of interactions (*a*_{
ij
}) in relation to the respective row total (*A*_{
i
}), and *q*_{
j
}the proportion of all interactions by partner *j* in relation to the total number of interactions (*m*). Thus,

{{p}^{\prime}}_{ij}={a}_{ij}/{A}_{i}, {\displaystyle \sum _{j=1}^{c}{{p}^{\prime}}_{ij}}=1, {q}_{j}={A}_{j}/m, and {\displaystyle \sum _{j=1}^{c}{q}_{j}}=1.

To quantify the specialization of a species *i*, the following index *d*_{
i
}is suggested. This *d*_{
i
}is related to Shannon diversity, similar to an index recently suggested to characterize biomass flow diversity in food webs [20]. However, an appropriate index in this context should not only consider the diversity of partners, but also their respective availability (see [22]). Consequently, the following index compares the distribution of the interactions with each partner (*p*'_{
j
}) to the overall partner availability (*q*_{
j
}). The Kullback-Leibler distance for species *i* is denoted as

{d}_{i}={\displaystyle \sum _{j=1}^{c}\left({{p}^{\prime}}_{ij}\cdot \mathrm{ln}\frac{{{p}^{\prime}}_{ij}}{{q}_{j}}\right)},

which can be normalized as

{{d}^{\prime}}_{i}=\frac{{d}_{i}-{d}_{\mathrm{min}}}{{d}_{\mathrm{max}}-{d}_{\mathrm{min}}}.

The theoretical maximum is given by *d*_{max} = ln (*m*/*A*_{
i
}), and the theoretical minimum (*d*_{min}) is zero for the special case where all *p*'_{
ij
}= *q*_{
j
}. However, a realistic *d*_{min} may be constrained at some value above zero given that *p*'_{
ij
}and *q*_{
j
}are calculated from discrete integer values (*a*_{
ij
}). To take this into account, *d*_{min} is more suitably computed algorithmically as in a program available from the authors and online [49], providing all *d*' for a given matrix. This standardized Kullback-Leibler distance (*d*') ranges from 0 for the most generalized to 1.0 for the most specialized case. Thus, *d*' can be interpreted as deviation of the actual interaction frequencies from a null model which assumes that all partners are used in proportion to their availability. An average degree of specialization among the species of a party can be presented as a weighted mean of the standardized index, e.g. <*d*'_{
i
}> for pollinators as

\u3008{{d}^{\prime}}_{i}\u3009=\frac{1}{m}{\displaystyle \sum _{i=1}^{r}\left({{d}^{\prime}}_{i}\cdot {A}_{i}\right)}={\displaystyle \sum _{i=1}^{r}\left({{d}^{\prime}}_{i}\cdot {q}_{i}\right)}

While <*d*'_{
i
}> usually differs from <*d*'_{
j
}>, the weighted means of the non-standardized Kullback-Leibler distances are the same for both parties, hence <*d*_{
i
}> = <*d*_{
j
}>.

### Network-level index

The following network-wide measure is based on the bipartite representation of a two mode network of interactions such as plant-animal or other resource-consumer interactions where members of each party interact with members of the other party but not among themselves (unlike many food webs). The two-dimensional Shannon entropy (termed *H*_{2} in order to avoid confusion with the common one-dimensional *H*) is obtained as

{H}_{2}=-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}\cdot \mathrm{ln}{p}_{ij}\right)}}.

*H*_{2} decreases with higher specialization. This measure is closely related to the weighted mean of the non-standardized Kullback-Leibler distance of all species, since

<*d*_{
i
}> = <*d*_{
j
}> = *H*_{2max} - *H*_{2}

(see below, *Relationship between d*_{
i
}*and H*_{2}). *H*_{2} can be standardized between 0 and 1.0 for extreme specialization versus extreme generalization, respectively, when its minimum and maximum values (*H*_{2min} and *H*_{2max}) are known. *H*_{2min} and *H*_{2max} can be calculated for given constraints. The constraints used here are the maintenance of the total number of interactions of each species, thus all row and column totals, *A*_{
i
}and *A*_{
j
}, being fixed (see also [46]). Alternative constraints may be defined depending on the knowledge of the system studied.

*H*_{2} reaches its theoretical maximum where each *p*_{
ij
}equals its expected value from a random interaction matrix (*q*_{
i
}·*q*_{
j
}), such that

{H}_{2\mathrm{max}}=-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({q}_{i}{q}_{j}\cdot \mathrm{ln}{q}_{i}{q}_{j}\right)}},

while its theoretical minimum (*H*_{2min}) may be close to zero depending on the matrix architecture. Like for *d*_{min} above, H_{2max} and H_{2min} are constrained by the fact that they are derived from integer values. A program implementing a heuristic solution to obtain *H*_{2max} and *H*_{2min}, and to perform the entire analysis is available from the authors or online [49].

The degree of specialization is obtained as a standardized entropy on a scale between *H*_{2min} and *H*_{2max} as

{{H}^{\prime}}_{2}=\frac{{H}_{2\mathrm{max}}-{H}_{2}}{{H}_{2\mathrm{max}}-{H}_{2\mathrm{min}}}.

Consequently, *H*_{2}' ranges between 0 and 1.0 for extreme generalization and specialization, respectively.

### Comparison with random associations

*H*_{2} can be tested against a null model of random associations (*H*_{2ran}). A number of random permutations of the matrix can be performed using a *r* × *c* randomization algorithm (also available at [49]). The probability (*p*-value) that the observed *H*_{2} is more specialized than expected by random associations is simply given as the proportion of values obtained for *H*_{2ran} that are equal or larger than *H*_{2}, a common procedure in randomization statistics [25, 50]. *H*_{2ran} is usually only slightly larger than H_{2min}. Previously, permutations of *r* × *c* contingency tables often used a different test statistics instead of *H*_{2} [25, 51, 52]:

T=-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({a}_{ij}.\mathrm{ln}{a}_{ij}\right)}}.

The relationship between *T* and *H*_{2} is described by a constant, the total number of interactions (*m*), as *T* = *m*·ln *m* - *m*·*H*_{2}. Consequently, both methods yield exactly the same *p*-values.

### Relationship between *d*_{
i
}and *H*_{2}

In the following we derive the relationship between the individual levels of specialization (*d*_{
i
}) and the community level (*H*_{2}). The non-standardized Kullback-Leibler distance for row *i* can be rewritten as

{d}_{i}={\displaystyle \sum _{j=1}^{c}\left({{p}^{\prime}}_{ij}\cdot \mathrm{ln}\frac{{{p}^{\prime}}_{ij}}{{q}_{j}}\right)}={\displaystyle \sum _{j=1}^{c}\left(\frac{{p}_{ij}}{{q}_{i}}\cdot \mathrm{ln}\frac{{p}_{ij}}{{q}_{i}\cdot {q}_{j}}\right)},

because {p}_{ij}=\frac{{a}_{ij}}{{A}_{i}}\cdot \frac{{A}_{i}}{m}={{p}^{\prime}}_{ij}.{q}_{i}.

The weighted mean of *d*_{
i
}for all *i* rows (each row weighted by *q*_{
i
}) yields

\begin{array}{c}\u3008{d}_{i}\u3009={\displaystyle \sum _{i=1}^{r}\left({d}_{i}\cdot {q}_{i}\right)}={\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}\cdot \mathrm{ln}\frac{{p}_{ij}}{{q}_{i}\cdot {q}_{j}}\right)}}\\ ={\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}\cdot \mathrm{ln}{p}_{ij}\right)}}-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}\cdot \mathrm{ln}{q}_{i}\right)}}-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}\cdot \mathrm{ln}{q}_{j}\right)}}\\ ={\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}\cdot \mathrm{ln}{p}_{ij}\right)}-{\displaystyle \sum _{i=1}^{r}\left({q}_{i}\cdot \mathrm{ln}{q}_{i}\right)}-{\displaystyle \sum _{j=1}^{c}\left({q}_{j}\cdot \mathrm{ln}{q}_{j}\right)},}\end{array}

since {q}_{i}={\displaystyle \sum _{j=1}^{c}{p}_{ij}} and {q}_{j}={\displaystyle \sum _{i=1}^{r}{p}_{ij}}

While the first summand in the final equation for <*d*_{
i
}> equals -*H*_{2}, the remaining two summands correspond to the maximum entropy *H*_{2max}, because

\begin{array}{c}{H}_{2\mathrm{max}}=-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({q}_{i}{q}_{j}\cdot \mathrm{ln}{q}_{i}{q}_{j}\right)}}=-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({q}_{i}{q}_{j}\cdot \mathrm{ln}{q}_{i}\right)}}-{\displaystyle \sum _{i=1}^{r}{\displaystyle \sum _{j=1}^{c}\left({q}_{i}{q}_{j}\cdot \mathrm{ln}{q}_{j}\right)}}\\ =-{\displaystyle \sum _{i=1}^{r}\left({q}_{i}\cdot \mathrm{ln}{q}_{i}\right)}-{\displaystyle \sum _{j=1}^{c}\left({q}_{j}\cdot \mathrm{ln}{q}_{j}\right)}.\end{array}

Therefore,

<*d*_{
i
}> = *H*_{2max} -*H*_{2}.

The same calculation applies for <*d*_{
j
}>, thus <*d*_{
i
}> = <*d*_{
j
}>. Consequently, the degree of specialization of the entire network (corresponding to the deviation of the network-wide entropy from its maximum value) equals the weighted sum of the specialization of its elements (species).

### Properties of alternative niche breadth measures

The standardized Hurlbert's (*B*') and Smith's (*FT*) measure can be applied widely for niche breadth analysis [21, 22, 24]. In this context, the Kullback-Leibler distance (*d*) can be viewed as a modified Shannon-Wiener measure of niche breadth that accounts for niche availabilities. Like the Kullback-Leibler distance, both *B*' and *FT* compare the proportional distribution of individuals (*p*) to the proportional resource availability (*q*) (here: partner availability). For a certain species *i*, the two measures are in our notation:

{{B}^{\prime}}_{i}=\frac{{B}_{i}-{q}_{\mathrm{min}}}{1-{q}_{\mathrm{min}}},\text{with}{B}_{i}=\frac{1}{{\displaystyle \sum _{j=1}^{c}\left({p}_{ij}^{2}/{q}_{j}\right)}},\text{and}F{T}_{i}={\displaystyle \sum _{j=1}^{c}\left(\sqrt{{p}_{ij}\cdot {q}_{j}}\right)}.

Each *p*'_{
ij
}is the proportion of the number of interactions in relation to the respective row total, and *q*_{
j
}is the proportion of all interactions by partner *j* in relation to the total number of interactions. Thus,

{\displaystyle \sum _{j=1}^{c}{{p}^{\prime}}_{ij}}=1,{\displaystyle \sum _{j=1}^{c}{q}_{j}}=1,\text{and}{q}_{\mathrm{min}}=\mathrm{min}\left({q}_{j}\right).

Both the standardized Hurlbert's (*B*') and Smith's (*FT*) measure range between 0 for the most specialized case to 1.0 for extreme generalization (broadest niche). In the context of niche breadth, it has been shown that the Shannon-Wiener measure is most sensitive, while Hurlbert's and particularly Smith's measure are less sensitive for the selection of rare resources [21] (see also [20]).

For the application in network analyses, however, both *B*' and *FT* may show some undesired properties. Generally, *B*', *FT* and *d*' are reasonably well correlated with each other across the species within a network (e.g., *r*_{
s
}= -0.49 between *d*' and *B*', and *r*_{
s
}= -0.36 between *d*' and *FT* for the 90 pollinators in the network of Vázquez and Simberloff [33], both *p* < 0.001). However, differences with *d*' are substantial when a highly specialized species interacts largely exclusively with a specialized partner, e.g. a specialized pollinator with a plant that is almost exclusively pollinated by this one. Imagine a scenario where one exclusive interaction occurs between a plant species and a pollinator species in a 3 × 3 matrix (Table 2). If the interaction between pollinator sp. 3 and plant sp. 3 is only infrequent (e.g. *a*_{
33
}= 1), all indices show a high degree of specialization (*d*' = 1.0, *B*' = 0, *FT* = 0.14) for both partners. However, as the number of exclusive interactions (*a*_{
33
}) increases, the values for both *B*' and *FT* of pollinator sp. 3 and plant sp. 3 show a highly undesired change towards generalization, although a higher *a*_{
33
}is intuitively considered as extreme specialization (e.g., for *a*_{
33
}= 50 the values for pollinator sp. 3 are *B*' = 0.31 and *FT* = 0.70), while only *d*' remains unaffected (*d*' = 1.0). *FT* is always larger than zero, and *B*' becomes larger than zero when the specialists interact more frequently than one of the other partners, thus when *q*_{
j
}> min(*q*_{1}, *q*_{2}, ... *q*_{
c
}). Both *FT* and *B*' approach a value of 1.0 (maximum generalization) for very large *a*_{
33
}. This undesired effect of *FT* and *B*' is not restricted to completely exclusive interactions between two partners.

### Simulation of sampling effort and matrix architecture

Two published plant-pollinator networks were selected to investigate the behavior of different specialization measures [32, 33]. Both articles use their observed interaction matrices as a model to discuss network properties based on the number of links per pollinator or plant species, allowing a comparison of conclusions drawn. Both networks may be compared as they comprise relatively large datasets from temperate ecosystems, reporting interaction frequencies between plants and their floral visitors: the British meadow community studied by Memmott [32] involved 79 pollinator and 25 plant species (2183 pollinator visits observed), the forests in Argentina studied by Vázquez and Simberloff [33] involved 90 pollinator and 14 plant species (5285 visits). The datasets can be obtained from the Interaction Web Database [53]. We simulated a decreased sampling intensity in both networks using a rarefaction method in order to investigate how sampling effort affects the estimation of specialization indices. Real association matrices were reduced by randomly extracting interactions, e.g. from the total of *m* = 2183 visits in Memmott's web down to *m* = 5 visits (in steps of five, repeated ten times for each *m*).

In order to compare the null model characteristics of the specialization measures, we simulated artificial matrices with randomly associated partners and plotted the indices against an increasing number of partners and/or total number of interactions. We assumed that the total frequency of participating species approximates a lognormal distribution, which is typical for biological communities [21, 22, 24]. All row and column totals were randomly generated from a lognormal distribution (*μ* = 50, *∑*= 1) that was scaled to the desired total number of interactions. Ten different combinations of row and column totals were obtained for each matrix size and taken as template to randomly associate the partners five times, thus each matrix size was represented by 50 random associations.

## References

Olesen JM, Jordano P: Geographic patterns in plant-pollinator mutualistic networks. Ecology. 2002, 83: 2416-2424.

Novotny V, Basset Y: Host specificity of insect herbivores in tropical forests. Proc R Soc London Ser B. 2005, 272: 1083-1090. 10.1098/rspb.2004.3023.

Waser NM, Chittka L, Price MV, Williams NM, Ollerton J: Generalization in pollination systems, and why it matters. Ecology. 1996, 77: 1043-1060. 10.2307/2265575.

Jordano P: Patterns of mutualistic interactions in pollination and seed dispersal: connectance, dependence asymmetries, and coevolution. Am Nat. 1987, 129: 657-677. 10.1086/284665.

Bascompte J, Jordano P, Melian CJ, Olesen JM: The nested assembly of plant-animal mutualistic networks. Proc Natl Acad Sci USA. 2003, 100: 9383-9387. 10.1073/pnas.1633576100.

Waser NM, Ollerton J, Eds: Plant-pollinator interactions: from specialization to generalization. 2006, Chicago: University of Chicago Press

Ollerton J, Cranmer L: Latitudinal trends in plant-pollinator interactions: are tropical plants more specialised?. Oikos. 2002, 98: 340-350. 10.1034/j.1600-0706.2002.980215.x.

Devoto M, Medan D, Montaldo NH: Patterns of interaction between plants and pollinators along an environmental gradient. Oikos. 2005, 109: 461-472. 10.1111/j.0030-1299.2005.13712.x.

Winemiller KO: Must connectance decrease with species richness?. Am Nat. 1989, 134: 960-968. 10.1086/285024.

Martinez ND: Constant connectance in community food webs. Am Nat. 1992, 139: 1208-1218. 10.1086/285382.

May RM: Will a large complex system be stable?. Nature. 1972, 238: 413-414. 10.1038/238413a0.

Rejmánek M, Starý P: Connectance in real biotic communities and critical values for stability of model ecosystems. Nature. 1979, 280: 311-313. 10.1038/280311a0.

Vázquez DP, Simberloff D: Ecological specialization and susceptibility to disturbance: conjectures and refutations. Am Nat. 2002, 159: 606-623. 10.1086/339991.

Dunne JA, Williams RJ, Martinez ND: Network structure and biodiversity loss in food webs: robustness increases with connectance. Ecol Lett. 2002, 5: 558-567. 10.1046/j.1461-0248.2002.00354.x.

Magurran AE: Ecological diversity and its measurement. 1988, Princeton: Princeton University Press

Sahli HF, Conner JK: Characterizing ecological generalization in plant-pollination systems. Oecologia. 2006, 148: 365-372. 10.1007/s00442-006-0396-1.

Parrish JAD, Bazzaz FA: Difference in pollination niche relationships in early and late successional plant communities. Ecology. 1979, 60: 597-610. 10.2307/1936080.

Basset Y: Diversity and abundance of insect herbivores foraging on seedlings in a rainforest in Guyana. Ecol Entomol. 1999, 24: 245-259. 10.1046/j.1365-2311.1999.00197.x.

Herrera CM: Plant generalization on pollinators: species property or local phenomenon?. Am J Bot. 2005, 92: 13-20.

Bersier LF, Banasek-Richter C, Cattin MF: Quantitative descriptors of food-web matrices. Ecology. 2002, 83: 2394-2407.

Krebs CJ: Ecological Methodology. 1999, Menlo Park: Benjamin Cummings

Hurlbert SH: Measurement of niche overlap and some relatives. Ecology. 1978, 59: 67-77. 10.2307/1936632.

Feinsinger P, Spears EE, Poole RW: A simple measure of niche breadth. Ecology. 1981, 61: 27-32. 10.2307/1936664.

Smith EP: Niche breadth, resource availability, and inference. Ecology. 1982, 63: 1675-1681. 10.2307/1940109.

Fonseca CR, Ganade G: Asymmetries, compartments and null interactions in an Amazonian ant-plant community. J Anim Ecol. 1996, 65: 339-347. 10.2307/5880.

Kenny D, Loehle C: Are food webs randomly connected?. Ecology. 1991, 72: 1794-1799. 10.2307/1940978.

Goldwasser L, Roughgarden J: Sampling effects and the estimation of food-web properties. Ecology. 1997, 78: 41-54. 10.2307/2265977.

Vázquez DP, Aizen MA: Asymmetric specialization: a pervasive feature of plant-pollinator interactions. Ecology. 2004, 85: 1251-1257.

Kay KM, Schemske DW: Geographic patterns in plant-pollinator mutualistic networks: comment. Ecology. 2004, 85: 875-878.

Vázquez DP, Morris WF, Jordano P: Interaction frequency as a surrogate for the total effect of animal mutualists on plants. Ecol Lett. 2005, 8: 1088-1094. 10.1111/j.1461-0248.2005.00810.x.

Borer ET, Anderson K, Blanchette CA, Broitman B, Cooper SD, Halpern BS, Seabloom EW, Shurin JB: Topological approaches to food web analyses: a few modifications may improve our insights. Oikos. 2002, 99: 397-401. 10.1034/j.1600-0706.2002.990222.x.

Memmott J: The structure of a plant-pollinator food web. Ecol Lett. 1999, 2: 276-280. 10.1046/j.1461-0248.1999.00087.x.

Vázquez DP, Simberloff D: Changes in interaction biodiversity induced by an introduced ungulate. Ecol Lett. 2003, 6: 1077-1083. 10.1046/j.1461-0248.2003.00534.x.

Dicks LV, Corbet SA, Pywell RF: Compartmentalization in plant-insect flower visitor webs. J Anim Ecol. 2002, 71: 32-43. 10.1046/j.0021-8790.2001.00572.x.

Vázquez DP, Aizen MA: Null model analyzes of specialization in plant-pollinator interactions. Ecology. 2003, 84: 2493-2501.

Auerbach MJ: Stability, probability, and the topology of food webs. Ecological communities: conceptual issues and the evidence. Edited by: Strong DR, Simberloff D, Abele LG, Thistle AB. 1984, Princeton: Princeton University Press, 413-436.

Gotelli NJ, Graves GR: Null models in ecology. 1996, Washington: Smithsonian Institution

Bascompte J, Jordano P, Olesen JM: Asymmetric coevolutionary networks facilitate biodiversity maintenance. Science. 2006, 312: 431-433. 10.1126/science.1123412.

Memmott J, Waser NM, Price MV: Tolerance of pollination networks to species extinctions. Proc R Soc London Ser B. 2004, 271: 2605-2611. 10.1098/rspb.2004.2909.

Johnson SD, Steiner KE: Generalization versus specialization in plant pollination systems. Trends Ecol Evol. 2000, 15: 140-143. 10.1016/S0169-5347(99)01811-X.

Symons FB, Beccaloni GW: Phylogenetic indices for measuring the diet breadths of phytophagous insects. Oecologia. 1999, 119: 427-434. 10.1007/s004420050804.

Webb CO, Ackerly DD, McPeek MA, Donoghue MJ: Phylogenies and community ecology. Annu Rev Ecol Syst. 2002, 33: 475-505. 10.1146/annurev.ecolsys.33.010802.150448.

Novotny V, Basset Y, Miller SE, Weiblen GD, Bremer B, Cizek L, Drozd P: Low host specificity of herbivorous insects in a tropical forest. Nature. 2002, 416: 841-844. 10.1038/416841a.

Jordano P, Bascompte J, Olesen JM: Invariant properties in coevolutionary networks of plant-animal interactions. Ecol Lett. 2003, 6: 69-81. 10.1046/j.1461-0248.2003.00403.x.

Vázquez DP: Degree distribution in plant-animal mutualistic networks: forbidden links or random interactions?. Oikos. 2005, 108: 421-426. 10.1111/j.0030-1299.2005.13619.x.

Gotelli NJ: Null model analysis of species co-occurrence patterns. Ecology. 2000, 81: 2606-2621. 10.2307/177478.

Fenster CB, Armbruster WS, Wilson P, Dudash MR, Thomson JD: Pollination syndromes and floral specialization. Annu Rev Ecol Evol Syst. 2004, 35: 375-403. 10.1146/annurev.ecolsys.34.011802.132347.

Kullback S, Leibler RA: On information and sufficiency. Ann Math Stat. 1951, 22: 79-86.

Montecarlo statistics on RxC matrices. [http://itb.biologie.hu-berlin.de/~nils/stat/]

Manly B: Randomization bootstrap and Monte Carlo methods in biology. 1997, London: Chapman and Hall

Blüthgen N, Verhaagh M, Goitía W, Blüthgen N: Ant nests in tank bromeliads – an example of non-specific interaction. Insect Soc. 2000, 47: 313-316. 10.1007/PL00001722.

Patefield WM: An efficient method of generating random RxC tables with given row and column totals. Appl Stat. 1981, 30: 91-97. 10.2307/2346669.

Interaction web database. [http://www.nceas.ucsb.edu/interactionweb/]

## Acknowledgements

We thank Diego Vázquez, Pedro Jordano, Thomas Hovestadt, and Michel Loreau for helpful comments and valuable discussion on earlier versions of this manuscript and the Interaction Web Database [53] for providing the datasets used here.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Authors' contributions

NB1 conceived of the study and all authors (NB1, FM, NB2) were involved in designing the methods, analyses, interpretation and drafting the manuscript.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Blüthgen, N., Menzel, F. & Blüthgen, N. Measuring specialization in species interaction networks.
*BMC Ecol* **6**, 9 (2006). https://doi.org/10.1186/1472-6785-6-9

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/1472-6785-6-9