The performance of DNA-based specimen identification in Diptera using COI varies greatly in the literature. Identification success, when using a monophyly criterion, ranges from less than 50% in one genus of Calliphoridae  to over 90% in most other families studied [12, 13, 26, 92]. We show that DNA barcoding is a highly efficient tool for the identification of northern Nearctic muscid flies, as we report congruence levels of 98% between morphological and molecular species limits in 160 taxa when using a clustering approach and enforcing strict monophyly and high bootstrap requirements. This value rises above 99% upon relaxing the bootstrap requirement; just one case of a mixed cluster of two species remained in our dataset following post-barcoding morphological reassessment, representing a single conspicuous taxonomic puzzle.
Characterization of genetic divergence
In one of the first attempts to characterize levels of genetic divergence among congeneric species across various taxa , it was determined that a threshold of 2% generally separated levels of intra and interspecific sequence divergence in most invertebrate taxa. It has since been demonstrated that levels of intra and interspecific variation will generally partially overlap in well-populated data sets . Both the range and the average of genetic divergences detected will vary according to the taxonomic group selected and be influenced by the phylogenetic relatedness of selected species, as well as by the number and geographical distribution of species and specimens in a data set [19, 28, 29, 93, 94]. In general, intraspecific divergences are expected to increase and interspecific divergences to decrease with more comprehensive taxonomic sampling , larger geographic scope , and the inclusion of more stable environments, such as tropical lowlands , where extinction rates are expected to be lower. Despite these considerations, datasets often show that DNA barcodes retain the ability to discriminate species—and to elucidate undescribed diversity—even across large geographic regions [93, 95] and in rich tropical insect faunas [7, 12–14, 96]; but see .
In Diptera, ranges of 0.17-1.20% and 3.00-5.40% have been reported for average of the means and maxima of COI intraspecific distances, respectively [12, 26, 27, 92, 97]. The values reported here for our post-reassessment Muscidae data set are comparable yet at the lower end of these ranges (average of the means 0.18%; maximum of 3.01%). The constrained intraspecific divergences here may reflect several factors, such as the high quality of the prior species-level taxonomic work in the Muscidae, our having conducted genitalic examination of most specimens, as well as the northern geographic focus of our work. The relative completeness of the taxonomy of the northern Muscidae is affirmed by the fact that only a small proportion of genetic clusters in our study, which were also separated from relatives by morphological characters, could not be linked with named species (16 of 160 = 10%). Despite these likely explanations for our comparatively low intraspecific divergences, it is challenging to interpret differences in levels of intraspecific genetic divergence among taxa for which different character sets are used for taxonomy. We suggest that the near-complete correspondence between genetic groupings and morphospecies for the Muscidae gives added weight both to DNA barcodes and to the morphological characters typically used for species-level diagnosis in Muscidae taxonomy (mainly chaetotaxy and genitalia). The correspondences suggest that both are likely to be revealing the true underlying species boundaries, which remain unknown to us.
Several additional factors beyond taxonomy, such as the number of sequences or the inclusion of sequences from a range of geographic localities, can influence the extent of genetic divergences measured within species [19, 28, 98]. Despite theoretical concerns that intraspecific divergences will increase dramatically when studies are conducted at large spatial scales, the majority of empirical evidence to date indicates that this is a more modest problem for DNA barcoding than originally envisaged. Bearing in mind that only 28 species could be included in our analysis, the inclusion of sequences from localities other than Churchill did not have an influence on maximum intraspecific distance in our dataset. These results are comparable to those of Hebert et al. , who reported low intraspecific variation among 11,289 sequences of lepidopteran species (1327 species in 62 families) collected from different localities in eastern North America, as well as the results of Lukhtanov et al. for Central Asian butterflies. By contrast, the Trichoptera (caddisflies) of North America  as well as diving beetles (tribe Agabini) of the western Palearctic , which both inhabit freshwaters expected to be more divided than terrestrial insect habitats, exhibit increasing intraspecific genetic divergence at large spatial scales. Part of this increase may be attributable to previously unrecognized species being lumped together under current names; despite this issue, DNA barcoding remained effective (90-93%) at distinguishing named morphospecies within these taxa at continental spatial scales [98, 99]. It appears, then, that global sequence libraries of insects may serve as references for local species identification for newly studied sites, at least for many groups in the temperate and polar zones. Success rates are particularly high for vagile groups (such as Lepidoptera), while even for more challenging groups identification success can be near 100% at smaller spatial scales or when employing joint geographic and genetic data . Further work on the question of barcode variability at very large spatial scales is particularly required in tropical environments, as the majority of tropical insect DNA barcoding studies to date have included a relatively modest regional spatial scale (e.g. [7, 8, 12–14, 29, 100].
As with intraspecific distance values reported here, the minimum (0.77%) and average (4.82%) of the nearest neighbour interspecific distances for the post-reassessment data set were lower than most interspecific distances found in the literature for insects, including mosquitoes , black flies , bees , mayflies, stoneflies and caddisflies , and springtails , but comparable to those reported for tachinid flies . However, some studies report average congeneric divergences rather than nearest-neighbour distances as employed here, which provide the more stringent test of discriminating the closest relatives . In their foundational work, Hebert et al.  reported that more than 98% of invertebrate taxa they investigated (including 177 species of Diptera, but no Muscidae) showed more than 2% pairwise distance to their nearest neighbour. In contrast, only 86% of the 160 taxa in the present work were separated from their nearest neighbour by a distance greater than 2%. This difference is attributable to our focus on numerous species from a single family (89% of the fauna of Churchill ), and approximately half of the arctic and subarctic Nearctic fauna , as opposed to the taxonomically broad but poorly populated data set of Hebert et al. . Limits of species with distance to nearest neighbour < 2% in our data set were supported by morphological characters, but these were occasionally subtle and/or only detectable in the males, possibly suggesting a recent divergence time .
As to be expected from a well-populated data set [19, 98], we report an important overlap in the range of intra and interspecific distances for our data set, clearly indicating a lack of “barcoding gap”  in muscid flies. While distance-based methods for species determination have been extensively criticized (e.g. [19, 24]), it was through the combination of cluster examination on the NJ tree and the use of 2% as an arbitrary divergence threshold to identify “anomalous” distance values that we were able to rapidly pinpoint and address taxonomic issues in our original data set, as well as confirm that minimum interspecific distance in Muscidae ranges well below 2% for many species.
It is important to expand upon our above understanding of divergence patterns in the Muscidae by including specimens from warm temperate and tropical regions. The often-low interspecific divergences we found between sibling species present in Churchill were associated with reciprocal monophyly in the vast majority of cases. In more southerly regions, higher richness combined with greater intraspecific genetic structure have been described as presenting a challenge for barcode-based species discrimination . Incomplete lineage sorting among many young species pairs would complicate the clustering-based identification approach advocated here for the northern muscids. However, barcode results to date for some tropical insect faunas are promising (e.g. [12–14, 102]; but see ).
Supposedly depauperate northern regions might be expected to be an “easy” test for barcoding due to lower species richness and lineage pruning during glaciations, as has been demonstrated for fish, for example . However, our usage of Churchill and other northern regions may, in fact, provide a relatively stringent test of barcoding success for the Muscidae. Being one of the most speciose and broadly distributed family of terrestrial insects in northern regions , muscids are likely to have been strongly influenced by glaciations, and our observed shallow interspecific divergences among many pairs of congenerics suggest recent speciation events during the Pliocene and Pleistocene, when applying an approximate molecular clock calibration to our divergences (e.g. ). Moreover, the Churchill region is a zone of admixture from Beringian, high arctic, and southerly refugia (e.g. ). This combination of factors may lead to mixing of intraspecific lineages from different refugia as well as young species in the Churchill region. Further data from additional geographic regions will be desirable to confirm that the patterns reported here are broadly applicable for all of Muscidae, but we optimistically predict that muscids will be broadly amenable to barcoding.
Future success rate of barcode-based identification of unknowns
Congruence between morphological and molecular species limits was 97.8% when using a clustering approach with high bootstrap support and enforcing a monophyly requirement in the molecular results, while clustering and identification success was 99% using clustering with a relaxed bootstrap criterion. We found this high level of correspondence to be surprising, given that monophyly is considered a strict test of species limits. Funk and Omland  reported that up to 23% of species may be paraphyletic or polyphyletic; however, they noted that this proportion declines in better-studied taxa, suggesting that a portion of this total reflects incomplete taxonomic knowledge.
By contrast, threshold-only based methods would yield lower success for grouping unknown individuals into species units, with a maximum success rate of 90% found at a threshold of 1.2%, which is less than half of the threshold value found to minimize error rate for a group of marine molluscs . While we recommend combining distance and cluster-based approaches for taxonomic and faunistic works concerned with “true” species boundaries and numbers, such a level of success would permit rapid assessments of approximate species richness in unknown faunas. Furthermore, a combination of clustering and threshold-based approaches would allow new taxa or singletons to be flagged as likely new species. Our results also may contribute to the development of relaxed clustering methods, whereby divergences exceeding specified thresholds are permitted. Moreover, our study demonstrates the great utility of having well-populated species-level reference libraries; we have found that neither small interspecific distances nor large intraspecific distances will derail identification success when there are many reference sequences against which to match unknowns.
While specimens of Graphomya were excluded from all analyses of species limits due to taxonomic issues, at a threshold of 1.2%, our 19 sequences form five putative species and the two lineages represented by more than one specimen are monophyletic with high bootstrap values (Additional file 3). Since only one of these five putative species contains at least two specimens of the same sex, the barcoding of additional individuals will be necessary before it can be determined if these lineages are all distinct morphologically and if they correspond, at least in part, to the Nearctic species as defined in Arntfield .
In contrast to the results obtained at the species level, generic limits were poorly supported by COI in the NJ tree (Figure 2), with more than half of the genera represented by two or more species being para- or polyphyletic. It appears, then, that muscid specimens cannot be reliably identified to genus using COI based solely on association with closely related taxa, at least when based on the NJ method of tree building. The percentage of insect genera forming monophyletic clusters based exclusively on COI varies greatly in the literature, with values similar to those reported here in ithomiine butterflies (50-61% depending on clustering method)  and black flies (62.5%) , but much higher in bees (100%) . It remains unclear whether this is due to lack of phylogenetic signal in COI at this depth, the type of tree-building method, or to the true lack of monophyly of genera as currently defined; further phylogenetic work involving a multi-gene approach is required to address the prospects for higher-level taxonomic assignments in Diptera based upon COI.
DNA barcoding and Nearctic Muscidae taxonomy
The DNA barcode reference library produced in our work allowed us to resolve the problematic issue of male/female associations for 5 of our 6 ambiguous species pairs as well as confirm or challenge our diagnosis of sex associations for members of unnamed morphospecies. Our results demonstrate that a well-populated reference library not only facilitates the association of conspecific specimens or the detection of identification errors, but that it also contributes to the taxonomic workflow through discovering morphologically distinct taxa and challenging accepted species limits. The discovery of Spilogona sp. 12 was especially significant, as it allowed Jolicoeur and Savage (personal communication) to document that the most abundant species of Schizophora (Diptera) on the alpine tundra of the McGerrigle mountains of the province of Québec is, in fact, the undescribed muscid Spilogona sp. 12 rather than the similar Spilogona contractifrons, recorded in the literature from the northern Appalachians and numerous other Nearctic localities [38, 106]. While we confirm the presence of both Spilogona sp. 12 and S. contractifrons in Churchill, the Nearctic distribution of the latter will need to be entirely reassessed in light of this new discovery.
The taxonomic reassessment also led to the reinstatement of Phaonia luteva stat. nov. as a species distinct from P. errans. Malloch  recognized three distinct Nearctic varieties of Phaonia errans: a yellow-legged variety, Phaonia errans errans (Meigen); a dark-legged variety, Phaonia errans varipes (Coquillet); and a variety with rufous-yellow legs and distinctive chaetotaxy, Phaonia errans completa Malloch. Huckett  synonymized varipes Coquillet with Anthomyia luteva Walker and treated the dark-legged form as Phaonia errans var. luteva in later publications [38, 60]. Since specimens of Phaonia errans sensu lato clustered here into distinct yellow and dark-legged branches separated by more than 4% intraspecific distance (higher than all other taxa in this work), we concluded that the dark-legged specimens belonged to P. luteva as interpreted by Huckett  based on his examination of Walker’s type  and that this taxon should be recognized as a full species distinct from P. errans. Specimens of Phaonia errans var. completa were not available for DNA extraction in the context of this work but the distinctive leg colour and chaetotaxy of this taxon suggest that it might also be a separate species rather than a regional variety of P. errans.
A very low level of genetic divergence between species, well below the delineated threshold, may reflect intraspecific polymorphism. Of all the morphologically distinct taxa included in this work, only T. septentrionalis and T. spiniger shared identical haplotypes. While males of these taxa can be easily distinguished morphologically (see results section), they share a mostly overlapping Nearctic distribution [38, 41]. In a phylogenetic analysis of Thricops based on a combination of morphological and nuclear characters including COI, COII, and the nuclear gene white, Savage et al.  treated the two species as distinct but very closely related. Savage et al. , however, included only one specimen of each taxon in the analysis, therefore preventing an assessment of intraspecific vs interspecific distances. Based mostly on geographical distribution data for these two taxa, we suspect that T. septentrionalis and T. spiniger may belong to one polymorphic species. In order to test this hypothesis, and before permanent changes are made to their taxonomic status, the genetic distance between T. septentrionalis and T. spiniger should be further assessed with other markers capable of distinguishing between closely related species as done by Whitworth et al. , who found that COI and COII underestimated species numbers in the genus Protocalliphora but that the analysis of amplified fragment length polymorphism (AFLP) generated clusters corresponding to morphological Protocalliphora species limits. Mitochondrial DNA introgression associated with Wolbachia infection, a factor that has been proposed to explain a lack of correspondence between COI and morphology in insects [91, 108], could also possibly explain the presence of shared haplotypes between T. spiniger and T. septentrionalis. The high congruence between molecular and morphological species limits in our study suggests, however, that mitochondrial DNA introgression is not common in our data set.
An important application of DNA barcoding is the discovery of cryptic species, revealed through large intraspecific divergence values in an otherwise morphologically uniform taxon. In Diptera, cryptic species appear to be especially common in parasitoid flies of the family Tachinidae [12, 13], but no information was available for muscid flies prior to this study. In the post-reassessment data set, only H. evecta, H. laxifrons and S. atrisquamula demonstrated maximum levels of intraspecific distances greater than 2% (but still no higher than 3.01%) coupled with homogeneous morphological characters. As there is nothing among the scant information currently available on the ecology of these species suggesting the presence of distinct internal lineages , we retained the currently accepted species limits for these taxa. However, we recommend the analysis of further molecular data such as the Internal Transcribed Spacers (ITS) region of the ribosomal DNA, a marker that has performed well to confirm the presence of cryptic lineages in the Diptera genera Belvosia (Tachinidae)  and Chrysomya (Calliphoridae) .