Urban sites
Birds were censused in 93 parks, cemeteries and other urban green spaces in Oslo (~60°N, 11°E, Additional file 1: Table S1, Additional file 2: Figure S1). This represented nearly all urban green spaces larger than 1 ha in built-up areas of Oslo. Urban sites had a median size of 8.6 ha (range 0.6 - 98.1 ha), and vegetation varied from intensively managed parks with ornamental deciduous trees and lawns, to green spaces with a mix of managed parkland and patches of more or less natural vegetation. In these sites, natural vegetation is dominated by deciduous forest and mixed forest; pure coniferous forest occurs predominantly outside built-up areas.
Downtown Oslo is a predominantly commercial area and green spaces are mostly restricted to small parks and cemeteries. Birds are also restricted to such sites in the most urbanized part of Oslo, except for a handful of species (see further in Discussion). From central Oslo there is a gradient through areas dominated by apartment buildings to residential areas with a larger amount of vegetation outside parks and other urban green spaces. The residential areas are adjacent to continuous forest (mostly boreal forest dominated by conifers) along much of the periphery of the city, but in some areas residential areas are adjacent to farmland. From the central part of Oslo distances to closest areas of continuous boreal forest are typically 5–6 km. During the study period, Oslo had 520,000-550,000 inhabitants representing an average population density of > 3,500 persons/km2.
Each urban site was censused (by SD) a total of three times, giving a total of 279 individual censuses. Censuses were conducted between sunrise and midday during the breeding season (mainly May-June) in 2003–2007. Each site was censused in at least two different years, at three different times of the breeding season (early, middle and late), and at different times of the day. Censuses consisted of walking slowly through each site, and paths were chosen to cover each site equally well and such that no part of the site was more than 100 m away from the path used. Censuses lasted 10–55 min and increased with the size of the site. Species were recorded as present or absent for each site, based on visual and vocal observations from all three censuses. As censuses aimed to detect potential breeding land bird species, wetland and passage migrant species (i.e. those migrating through and not breeding in the city) were excluded. Urban occurrence was measured as proportion of sites used for each species.
We chose to record species as present/absent instead of estimating density in urban areas because data for rural sites had been collected as species presence/absence and because methods used for obtaining density estimates (i.e. line transects, point counts) do not permit sampling of the full area of each site, which we considered necessary for a complete overview of the urban bird community. Moreover, collecting presence/absence data is considered an efficient method for large-scale monitoring [33]. It should also be noted that our index of urbanization (proportion of sites used for each species) gives a comprehensive, continuous measure, whereas several previous studies have simply compared urbanized versus non-urbanized (or less urbanized) species [6,8,9,14,18]. Finally, occurrence frequency and population density are likely to be positively correlated because widespread species generally have higher densities [32,34,35] and a link between presence/absence data (occupancy) and abundance is also expected on theoretical grounds [33]. This was also the case in our urban data set where occurrence frequency was significantly correlated with abundance based on approximate numbers of individuals observed during censuses (using highest count from the three censuses of each site; total number of birds observed across all sites: r
s
= 0.97, N = 60 species, P < 0.0001, mean number observed per occupied site: r
s
= 0.83, N = 60 species, P < 0.0001). Note that throughout the manuscript, we use the term commonness broadly, and as such this term encompasses both occurence frequency (occupancy) and population density.
Sites surrounding the city
Data on species presence/absence in sites surrounding the city (i.e. rural sites) were taken from species lists generated for > 1500 sites in Oslo and the neighbouring county of Akershus during fieldwork conducted for biodiversity conservation purposes (primarily by SD) during 1995–2011 (see e.g. [36]). Sites were selected to provide representative sampling of different habitats and elevations and encompass spatial variation. Sites were generally defined according to topographical and spatial features (such as hills, valleys, patches of farmland). From this extensive dataset, we selected all forest and farmland sites located within Oslo county (but outside the city itself) and the three closest municipalities in Akershus county (Bærum, Lørenskog and Nittedal) that had been investigated thoroughly at least once during the breeding season. Thus, 137 forest and 51 farmland sites served as potential source areas for land bird species found within urban sites in Oslo city. The larger number of forest sites than farmland sites reflected that forests dominate the surroundings of Oslo (see further in Additional file 3). Twelve sites had both forest and farmland, thus there were 176 different source sites in total. Typically, sites ranged from 50–500 ha in total area (median 110 ha, range 11–1780 ha), and the midpoint of source sites had a median distance of 7.0 km from closest built-up areas of Oslo (range 0.2-21.8 km). Only observations made within the breeding season (mainly May-June, though also April and July if observations clearly suggested breeding behaviour and excluded the possibility of passage migrants or post-breeding movements) were used to calculate frequency of occurrence across sites, and, as before, wetland and passage migrant (i.e. non-breeding) species were excluded. For sites visited only once, surveys lasted 1–5 hours, depending on the size of each site, and were conducted from sunrise until midday. Census paths were chosen to cover habitat diversity within sites in order to detect a high proportion of species present. In general, forest and farmland sites had lower survey effort per unit of area relative to urban sites, although time spent surveying was usually longer. See further in Additional file 3 for evaluations of the comparability of data from rural and urban sites.
Relative brain mass
We examined relative brain mass by including both brain mass and body mass (log-transformed) as independent variables in our statistical models. This approach controls for allometry between brain mass and body mass, and is preferable to the use of the simple residuals from a regression between the two variables (e.g. [37-39]) or expressing relative brain mass as a ratio of the two variables (e.g. [40,41]). Furthermore, by including both brain and body mass as independent variables in models it is possible to investigate additional body mass effects that are independent of brain mass. Finally, this approach is commonly used in studies of relative brain size (e.g. [14,19]) and as a method for normalising data for variation in body size in a range of traits (e.g. metabolic rate [42], testes size [43,44]).
Data on body mass and brain mass were taken from the literature. Body mass data was specific to populations in Norway [45]. In contrast, data on whole brain mass specific to Norway were unavailable, thus data were taken from European sources [46-52]. More specifically, we combined data from all sources to provide a complete list for all species in our study (see Additional file 1: Table S2 for details). Moreover, brain mass was strongly correlated among the five sources (r
s
= 0.97-1.00 in all ten comparisons, N = 36–56, all P < 0.0001). Similarly, body mass data given by the five sources for brain mass were strongly correlated with the Norwegian body mass data [45] (r
s
= 0.99-1.00, N = 49–82, all P < 0.0001). Values of body and brain mass are provided in Additional file 1: Table S2).
Ecological variables
Although a broad range of life history and ecological factors have been linked to urban bird community composition (e.g. diet, nesting site, fecundity, etc. [5-9,20]), we focused on three key features of a species ecology that are thought to influence either relative brain mass or the way species interact with their environment. First, migratory status (i.e. resident vs. migratory) was investigated because this factor appears to have a substantial effect on brain mass [52]. Species were classified as resident or migratory based on literature relevant to local conditions [53]; species in which a minor part of the population is resident were coded as migratory.
Next, the likelihood of species finding suitable breeding sites within the urban sites is expected to influence how frequently they are found in such urbanized areas. Therefore, we also investigated the effects of species breeding habitat. Species breeding habitat was classified following Dale et al. [53]; the major reference work on the status and distribution of birds in Oslo and Akershus counties. More specifically, species were classified into four habitat levels: (1) breeding predominantly in coniferous forest, (2) breeding predominantly in mixed and deciduous forest, (3) breeding predominantly in farmland habitat, and (4) breeding predominantly in urban areas. In the last instance, just five species were classsified as breeding in urban areas (see Additional file 1: Table S2), and this was necessary because they had a predominantly urban distribution and therefore could not be identified as either forest or farmland breeders. Importantly, this approach does not involve circularity in the analyses of how breeding habitat influences urban occurrence frequency because these species did not have higher urban occurrence frequency than all the other groups (urban: mean 0.30, range 0.01-0.71, farmland species: mean 0.37, range 0.00-1.00, coniferous forest species: mean 0.04, range 0.00-0.34, mixed and deciduous forest species: mean 0.33, range 0.00-0.99; Mann–Whitney U-tests: urban vs. farmland: P = 0.94, urban vs. coniferous: P = 0.003, urban vs. mixed/deciduous: P = 0.98). Thus, being classified as an urban breeder did not by definition imply being common in the urban sites. Furthermore, an analysis excluding these five species returned qualitatively similar results (data not shown).
Finally, we examined the effects of nest site location as this variable has previously been shown to influence urban bird community composition (e.g. [5,8,19,20]) and because vegetation structure differs dramatically between urban sites and those in the surrounding environment. For example, ground based vegetation tends to be lacking in urban areas, instead it is often replaced with short grass, which is expected to influence the possibility of ground nesting birds to find suitable nesting sites. Nest site location was classified into four levels: ground, low in bushes (<2 m above ground), high in trees (>2 m above ground), and in cavities or other concealed sites. Information on nest sites were taken from the standard reference work on Norwegian birds [45]. Values for ecological variables used for each species are provided in Additional file 1: Table S2).
Phylogeny
Species values may not represent independent data points for analysis due to similarities inherited through shared ancestry [54]. Therefore, we conducted all comparative analyses controlling for phylogeny (see below). We generated a phylogeny for the 90 species included in our main dataset from the recently published time-calibrated molecular phylogeny of all extant avian species [55]. More specifically, we downloaded 1000 phylogenetic trees for our species from those available at www.birdtree.org using the Hackett sequenced species backbone. Following Jetz et al. [55], we used the Hackett backbone for our analyses due to the more extensive genomic scope of loci used to construct this topology. We then summarised the sample of trees onto a single Maximum clade credibility (MCC) tree with median node heights using TreeAnnotator v1.7.4 (BEAST [56]). The phylogeny is shown in Additional file 4: Figure S2).
Statistical analyses
To account for the non-independence of data points due to shared ancestry of species we used a generalized least-squares (GLS) approach in a phylogenetic framework (PGLS) to perform multiple regression analysis. The PGLS approach allows the estimation (via maximum likelihood) of a phylogenetic scaling parameter (λ), which indicates the degree of phylogenetic dependency in correlations among traits. Specifically, values of λ = 1 indicate complete phylogenetic dependence (i.e. traits covary in direct proportion to their shared evolutionary history), while values of λ = 0 indicate phylogenetic independence of trait covariance (i.e. trait coevolution is independent of phylogeny). Following the estimation of λ values, we used likelihood-ratio tests to compare the model where λ assumes its maximum likelihood value against models with values of λ = 0 or 1 [57].
For our main analysis, we included occurrence frequency (proportion of sites used) in urban sites as our response variable and occurrence frequency (proportion of sites used) in sites surrounding Oslo, relative brain mass, migratory status, breeding habitat and nest site location as independent variables in our model. As detailed above, we examined the effect of relative brain mass by including both (log-transformed) brain mass and body mass as predictor variables in our model. This model incorporated occupancy data from all 90 species based on data from the 93 urban sites and the 176 rural sites.
Next, we repeated this analysis using a restricted subset of urban sites. Specifically, we examined whether urban occurrence frequency was associated with occurrence frequency in surrounding sites, relative brain mass, migratory status, breeding habitat and nest site location using data from only 22 sites occurring within the inner city centre, which represent a highly urbanized environment. These sites were generally smaller than those in the total set of urban sites (i.e. median 2.5 ha, range 0.6 - 24.2 ha), and were heavily used by people.
For these models, however, collinearity between the predictor variables brain and body mass was problematic (i.e. variance inflation factors (VIF) = 10.6 and 11.5, body and brain mass respectively, exceeding the threshold of 10 [58,59]), as a result of the strong correlation between brain mass and body mass (r = 0.93 [95%CI = 0.90 – 0.94], df = 88, t = 22.83, P < 0.0001, λ = 0.99 <0.0001,0.73). Consequently, following recommendations [60,61], we performed a sequential PGLS regression in which the effect of relative brain mass was assessed by including (log-transformed) brain mass as the original variable and body mass as the residuals from a PGLS regression of (log-transformed) body mass on brain mass (see also [42]). Though a statistically sound approach, the use of sequential regression does make the interpretation of the predictor variables less intuitive. Specifically, the effect of brain mass (i.e. the focal variable) is interpreted as the unique effect of brain mass in addition to the effect it already made through its relationship with body mass, while information on body mass (i.e. the residualized variable) per se is lost. Thus sequential regression is related to path analysis methods where variables can act both directly and indirectly through their relationships with other variables [61]. Because we also wanted to investigate the effect of body mass in our analyses, however, we repeated each sequential regression with body mass as the focal variable and brain mass included as the residuals from a PGLS log-log regression of brain mass on body mass. For these analyses, results for all predictor variables other than brain mass and body mass were identical. We therefore only report the results relevant to these two variables (i.e. body mass and residual brain mass) for these subsequent sequential regressions. Importantly, this approach eliminated collinearity among the predictors (all VIF < 2.2) and results reflected those obtained in the previous multiple regression models.
For these models, both continuous and categorical variables were included as predictor variables in our multiple regressions. Moreover, two of the three categorical variables used in our study (i.e. breeding habitat and nest site location) were comprised of four different levels. Consequently, because results relevant to the overall effect of each ecological variable cannot be obtained using the summary function for the PGLS (as this returns parameters for n-1 levels of each categorical factor using the remaining level as the reference level; i.e. “treatment contrasts”), we also summarized parameter effects from a multiple regressions (GLS) using a Type III (simultaneous) sum of squares (i.e. ANCOVA). This approach provided test statistics and the associated P-values for each of the six predictor variables (occurrence frequency in surrounding sites, brain mass, body mass, migratory status, breeding habitat and nest site location), and was appropriate in our case because the maximum likelihood estimate of λ was 0 (see results). Importantly, these results confirmed the earlier results from the PGLS. For simplicity, we present the Type III (simultaneous) sum of squares summaries in the main text and provide results of the full PGLS model in supplementary materials.
In addition to our main anlaysis, we also performed a PGLS to assess the reationship between occurence in urban sites and relative brain mass using a reduced model approach (i.e. when frequency in surrounding areas and ecological variables were not included in the model); this analysis was done to compare our results to those reported in previous studies. Note, however, that this simplified model had a higher Akaike’s information criterion (AIC) score compared to the full model of our main analysis (∆AIC = 108.8). Finally, in this instance, despite the correlation between brain mass and body mass, there was no serious collinearity among the predictor variables (i.e. VIF = 6.9).
For all PGLS models, we rejected the use of Bonferroni correction as it increases the probability of committing type II errors [62]. Instead, to assess the strength of the relationship between dependent and predictor variables, we calculated standardised effect sizes (partial r) from the t-values from multiple regressions [58]. We also calculated the noncentral 95% confidence intervals (CIs) for effect sizes from the t-values in our statistical models [58,59] confidence intervals excluding zero indicate statistical significance at level α = 0.05 [59].
Finally, in order to more fully explore the effects of the breeding habitat and nest site location on urban occurrence, we performed posthoc pairwise comparisons of all levels using a re-ordering approach. More specifically, for both breeding habitat and nest site location, we reran our main analysis model using PGLS, setting a different categorical level as the reference level each time. Thus the breeding habitat model was run 4 times, with a different level (i.e. urban, farmland, coniferous forest and mixed/deciduous forest) set as the reference level on each run. Similarly, the model for nest site location was run 4 times with one of the four levels (i.e. cavity, ground, low and high) set as the reference level on each run. In this instance, we calculated false discovery rates (FDR) for P-values to correct for multiple comparisons.
All variables were either normally distributed or ln-transformed to improve normality, and all occurrence frequency data were expressed as proportions and arcsine-square root transformed prior to analysis. Analyses were performed using R 3.0.2 [63] and the R packages ‘caper’ and ‘nlme’. Following Zuur et al. ([64], page 129) modeling assumptions (i.e. normality of residuals and homogeneity of variance) were validated through visual inspection of graphical model evaluation plots.
Analyses of previously published data
We used data on densities of land bird species during the breeding season within and outside six cities (Livorno, Pisa, Madrid, Angers, Rennes, Heinola) in four European countries [5,65-67]. This represented, to our knowledge, all published studies with relevant types of data. None of these studies specifically presented correlation analyses of species densities within and outside urban areas, but they did examine the issue of how urbanization affected bird community composition. For each study, there were data from 2–3 urban zones: city centre, city suburbs and (for two cities only) other urban areas (e.g. parks), and species data were presented as densities (or proportions of total numbers observed) as opposed to the analyses from Oslo where we used occupancy data. We tested whether density outside cities was related to density in each of the urban zones, while also including the potential effect of relative brain mass (ln-transformed body mass and brain mass) of each species; data on brain mass were taken from the same sources as reported above, and we used corresponding body masses reported by these sources. Information on brain mass was missing for eight species, thus analyses had reduced sample size compared to original publications (seven less for Madrid, one less for Angers, Rennes and Heinola).
Phylogenies and comparative analyses were conducted as described above. Briefly, for each city we generated a species phylogeny by downloading 1000 phylogenetic trees for our species and summarising this sample of trees onto a single MCC tree. Next, for each of the six European cities we performed PGLS regressions with urban density as the dependent variable and both density outside urban areas and log-transformed brain mass and body mass as independent predictor variables, running seperate models for each urban zone. For four cities – Madrid, Angers, Rennes and Heinola – there was no serious collinearity among the predictor variables (all VIF < 8.7). However, for two cities – Livorno and Pisa – there was collinearity among predictor variables (i.e. VIF > 12). Thus we again used sequential regression to remove collinearity among predictors and we present the results of these analyses for these two cities. Moreover, to examine the effects of both brain mass and body mass we performed the sequential regression once with brain mass as the focal variable and once with body mass as the focal variable. When necessary, variables were transformed prior to analysis to meet modelling assumptions, and modelling assumptions were validated through visual inspection of model evaluation plots following Zuur et al. ([64], page 129). Finally, as before, we calculated effect size (r) to determine the strength of the relationship between traits of interest. We also calculated 95% noncentral confidence limits for each r in order to assess statistical significance: confidence intervals excluding zero indicate statistical significance at the level of α = 0.05.