Microsatellite data suggest significant population structure and differentiation within the malaria vector Anopheles darlingi in Central and South America

Background Anopheles darlingi is the most important malaria vector in the Neotropics. An understanding of A. darlingi's population structure and contemporary gene flow patterns is necessary if vector populations are to be successfully controlled. We assessed population genetic structure and levels of differentiation based on 1,376 samples from 31 localities throughout the Peruvian and Brazilian Amazon and Central America using 5–8 microsatellite loci. Results We found high levels of polymorphism for all of the Amazonian populations (mean RS = 7.62, mean HO = 0.742), and low levels for the Belize and Guatemalan populations (mean RS = 4.3, mean HO = 0.457). The Bayesian clustering analysis revealed five population clusters: northeastern Amazonian Brazil, southeastern and central Amazonian Brazil, western and central Amazonian Brazil, Peruvian Amazon, and the Central American populations. Within Central America there was low non-significant differentiation, except for between the populations separated by the Maya Mountains. Within Amazonia there was a moderate level of significant differentiation attributed to isolation by distance. Within Peru there was no significant population structure and low differentiation, and some evidence of a population expansion. The pairwise estimates of genetic differentiation between Central America and Amazonian populations were all very high and highly significant (FST = 0.1859 – 0.3901, P < 0.05). Both the DA and FST distance-based trees illustrated the main division to be between Central America and Amazonia. Conclusion We detected a large amount of population structure in Amazonia, with three population clusters within Brazil and one including the Peru populations. The considerable differences in Ne among the populations may have contributed to the observed genetic differentiation. All of the data suggest that the primary division within A. darlingi corresponds to two white gene genotypes between Amazonia (genotype 1) and Central America, parts of Colombia and Venezuela (genotype 2), and are in agreement with previously published mitochondrial COI gene sequences interpreted as incipient species. Overall, it appears that two main factors have contributed to the genetic differentiation between the population clusters: physical distance between the populations and the differences in effective population sizes among the subpopulations.


Background
Anopheles (Nyssorhynchus) darlingi is the most efficient malaria vector in the Neotropics, and is responsible for transmission of Plasmodium falciparum, P. vivax and P. malariae [1][2][3][4][5]. Although recently shown to be somewhat of an opportunistic blood-feeder in eastern Amazonian Brazil [6], A. darlingi is regarded as the most anthropophilic anopheline in the Americas [7,8]. Anopheles darlingi's range extends from northeastern Argentina to southern Mexico, with a discontinuity in Nicaragua, Costa Rica, and Panama, hypothesized to be the result of an initial colonization event from northern South America into Central America, followed by a modern extinction event in these three countries [9,10]. A complete understanding of A. darlingi's population structure and the processes responsible for the distribution of differentiation is important to vector-based malaria control programs and for identifying heterogeneity in disease transmission as a result of discrete vector populations [11,12]. Susceptibility to Plasmodium infection, survival and reproductive rates, degree of anthropophily, and the epidemiology of malaria in the human host may all be affected by genetic variation in vector populations [13,14].
Many of the anopheline species responsible for malaria transmission are members of species complexes composed of closely related cryptic species [32]. Most notably, the well-studied African A. gambiae complex, which includes seven isomorphic and closely related mosquito species [33], as well as incipient species within A. gambiae s.s. [14,34,35]. The members of this complex are highly variable, and also display a large amount of adaptive genetic variation [36]. The recently identified A. darlingi incipient species may have differential susceptibility to malaria parasites, and (or) subtle ecological or behavioral differences that could require modifications to vector control efforts. Therefore, a restriction enzyme digestion was designed to distinguish A. darlingi genotypes 1 and 2 [Mirabello and others in submission].
Microsatellites are highly polymorphic genetic markers that evolve much faster than mitochondrial or nuclear genes, and are particularly useful for resolving the structure of populations at a finer geographical and evolutionary scale. The analyses were performed in two ways: 1) including all of the amplified loci from each population (a variation of 5, 7 and 8 loci); and, 2) including only the 5 loci that were amplified from all of the populations (ADC02, ADC28, ADC110, ADC137, and ADC138).
For each locality, summary polymorphism statistics were generated using Fstat 2.9.3 [49]. Deviations from Hardy-Weinberg equilibrium were assessed per locus and per locality, and linkage disequilibrium between pairs of loci within each locality using Fstat. The significance of these tests was determined using the randomization approach that applies Bonferroni corrections in Fstat. Within each locality the frequency of null alleles was determined using the Brookfield 2 estimate [50], and the allele and genotype frequencies were then adjusted accordingly in MICRO-CHECKER 2.2.3 [51]. The null allele-adjusted dataset was compared to the original dataset to investigate the impact of null alleles on estimations of genetic differentiation.
Genetic differentiation was estimated by calculating F ST between pairs of populations within and between A. darlingi samples using Arlequin 2.001 [52] and GENEPOP 1.2 [53]. The number of migrants per population per generation (N m ) between localities was estimated from pairwise F ST [54]. An analysis of molecular variance (AMOVA) was used to examine the distribution of genetic variation in Arlequin using F ST . We focused on estimates of F ST performed under the infinite alleles model (IAM) because this model is considered more reliable when fewer than 20 microsatellites are used [55]. The significance for all calculations was assessed by 10,000 permutations and the P-values were Bonferroni adjusted. The isolation by distance model was investigated as a potential explanation for the observed population differentiation. The significance of the regression of genetic distance on geographic distance between sample pairs was tested using a Mantel test [56] with 10,000 permutations using Arlequin.
Several approaches were used to investigate the relationships among populations. We constructed a neighborjoining (NJ) tree based on pairwise Nei et al.'s [57]D A distance and F ST values for all the populations using MEGA version 3.1 [58]. A Bayesian clustering analysis was implemented in STRUCTURE 2.1 [59,60] with a burn-in period of 500,000 chains and 1,000,000 Markov chains Monte Carlo replications for each of K = 1 to 8. This clustering method estimates the most probable number of discrete populations with no a priori assumptions of population structure. Each simulation was done in triplicate to assess the consistency of the data.
Inferences of non-neutral evolution were investigated using two tests, the homozygosity test implemented in BOTTLENECK 1.2.02 [61], and Kimmel's β-imbalance index [62] using the β 1 estimator [63]. Significance of the homozygosity test was evaluated by simulations implemented in BOTTLENECK. The homozygosity test was performed under the step-wise mutation model (SMM) and the two-phase mutation model (TPM) with one-step mutations occurring at a frequency of 90% of the total. SMM and TPM (specifically the 90% model) are considered the more realistic microsatellite mutation models [64], thus only these results are given. The β-imbalance index, as well as 95% confidence intervals, were estimated using a SAS program written and run by T. Lehmann [65]. The long-term effective population size (N e ) was esti-mated using NeEstimator version 1.3 [66] based on the linkage disequilibrium and heterozygote excess models.

Results
All individuals collected from Peru and Brazil were classified as white genotype 1 and all those collected from Guatemala and Belize were classified as white genotype 2 [10, Mirabello and others in submission].

Genetic diversity
From the original set of eight microsatellite loci designed for A. darlingi from eastern Amazonian Brazil [47], one locus (ADC107) did not amplify in any of the individuals collected from Peru and three loci (ADC01, ADC29, and ADC107) did not amplify in any of the individuals from Belize or Guatemala. Seven loci were genotyped from 350 individuals in Peru, five loci from 276 individuals from Central America (143 from Guatemala and 133 from Belize), and all eight loci from 57 individuals from BV and 58 from PLT, Brazil (Table 1)  . To determine if the null alleles impacted our population genetic analyses, we performed these analyses both before and after the dataset was adjusted for estimated null allele frequencies. The effect of this treatment was minimal and did not significantly change the degree or statistical significance of the estimated parameters.

Population structure and differentiation
For analyses of population structure and differentiation, only the data based on the five loci that were amplified from the entire sample set are shown.
An unsupervised Bayesian clustering analysis revealed five population clusters. Anopheles darlingi seemed to cluster mostly on the basis of physical proximity of sampling sites, with four population clusters within Amazonia and one cluster including all specimens from Belize and Guatemala ( Figure 1). The five population clusters had the following proportion of specimen membership from each collection site: (1)  Both the D A and F ST distance-based trees illustrated two main population clusters: one including all of the samples from Belize and Guatemala, and the other including all samples from Amazonian Peru and Brazil ( Figure 2). These two clusters are consistent with genotypes 1 and 2 [10]. Within the Amazonian cluster there were four smaller subclusters, corresponding to those detected with the Bayesian analysis (2-5 above).
An AMOVA using the five population clusters detected with the Bayesian analysis as the groupings, found that 18.9% of the variance was explained at the among groups level, and 77.9% at the within populations level. In an AMOVA using the two genotypes, Amazonia and Central America, as the groupings, 20.1% of the total variance was explained at the between groups level and 70.1% within populations. All of the global F ST estimates revealed significant overall genetic structure (P < 0.001). The majority of the genetic diversity in A. darlingi is accounted for by within-population differences among individuals.
Within genotype and population cluster levels of differentiation ranged from low to moderate. Within Central America (genotype 2) there was mostly low non-significant differentiation (mean F ST = 0.047, 33.3% P < 0.05), except for between GOL and all other subpopulations there was a moderate amount of significant differentiation (F ST range of 0.1063-0.1489) ( Table 2). Within Amazonia (genotype 1) there was a moderate level of significant differentiation (mean F ST = 0.1244, 89% P < 0.05), and within the four Amazonian population clusters the mean F ST ranged from -0.002 to 0.082 (70.6% P < 0.05) (  Table 3). All of the comparisons between genotypes 1 and 2, and between the five population clusters, revealed significant differentiation (P < 0.05) after correction for multiple tests. Estimates of gene flow revealed little or no recurrent gene flow between genotypes 1 and 2, and reduced gene flow among the population clusters in Amazonia (Table 3). Within each population cluster, there was moderate to high levels of gene flow, particularly high within NEA Brazil ( Table 3).
The significant differentiation between genotypes 1 (the 4 population clusters in Amazonian Peru and Brazil) and 2 (Central America) was genomewide, as shown by independent analyses for each of the five loci revealing significant differentiation between them (data not shown). The differentiation between the genotypes varied in magnitude, with the highest level of differentiation observed at locus ADC28 and the lowest at locus ADC110. The differentiation among the four Amazonian population clusters was not genomewide, as shown by independent analyses of the eight loci (data not shown). Loci ADC137 and ADC01 (only amplified in the Amazonian populations) did not show significant differentiation among the Amazonian population clusters. The highest level of differentiation among all of the Amazonian population clusters was at locus ADC02. lations is primarily due to restricted gene flow by geographic distance. Although, between Peru populations and those in NEA Brazil and SEAC Brazil the average distance is 2451 km and 2270 km and the mean pairwise F ST is 0.1484 and 0.0959, respectively; and, the average distance between Peru and Central American populations is 2923 km and the mean pairwise F ST is 0.3625, which demonstrates that the large genetic differentiation is not accompanied by correspondingly large difference in geographic distance.

Demographic inference
Statistics designed to detect a population expansion were calculated. These tests are based on the premise that an expanded population mutations are more likely to be recent and, therefore, would only differ in size by a single microsatellite repeat unit. The homozygosity test contrasts the homozygosity or expected heterozygosity estimated based on allele frequencies with that estimated based on the number of alleles and sample size [61]. The β-imbalance index is based on the imbalance between the variance in allele size and heterozygosity at a locus [62]. These statistics are expected to be equal in a neutral locus at mutation-drift equilibrium (MDE). The majority of the populations did not significantly depart from MDE. Many of the significant homozygosity test results were dependent on the mutation model used ( Table 4). The homozygosity test detected significant departures from equilibrium across many loci within Peru (MAZ, NAU, and SAE), WCA Brazil (MAC, CAS, NAI, and RBR), in BV and PLT, Brazil, and within Central America (GOL and SPB). The significantly higher heterozygosity based on the number of alleles suggests a recent expansion of these populations. Alternatively, these significant results could be due to a recent influx of rare alleles from genetically distinct populations [61].
The imbalance index is expected to depart from 1 after a demographic change. Specifically, the imbalance index is expected to be less than 1 after a population expansion and greater than 1 in a population that has expanded after a bottleneck [62]. However, the imbalance index may also be greater than 1 in a population that has experienced a severe bottleneck after an expansion [65]. Although none of these results were statistically significant, the values greater than 1 in Peru, WCA Brazil, NEA Brazil, SPB, Guatemala, BEL and BV, Brazil suggest that in these populations there could be a slight signal of an expansion following a bottleneck or a bottleneck after an expansion.

Effective population size
The N e estimates varied considerably among the subpopulations and population clusters, and depending on the model used (

Discussion
The microsatellites used in this study are highly polymorphic, and thus are useful for exploring A. darlingi's population genetic structure. Anopheles darlingi is a species characterized by moderate levels of molecular variability [21][22][23][24][25][26][27][28][29]67], and our microsatellite analysis is in agreement  . An allozyme study of two Amazonian populations detected significant deviations from HW equilibrium in 7/8 loci examined [70], although no significant deviations were detected in earlier allozyme studies [9,24]. The high levels of heterozygote deficits and null alleles could be the result of an accumulation of mutations in the primer binding sites which may be a consequence of the microsatellite library being constructed of A. darlingi from eastern Amazonian Brazil [47]. The incidence of null alleles found in A. darlingi is similar to that reported from many anopheline microsatellite studies [12,35,[39][40][41]; perhaps mosquitoes with large population sizes and high levels of polymorphism are more likely to have null alleles [39].
The eight microsatellite loci used in this study have not been physically mapped to A. darlingi polytene chromosomes. Therefore, their location with respect to polymorphic chromosome inversions is unknown, and such information may modify the interpretation of the data because neutrality cannot be assumed. Since the analyses were done in two ways, including all amplified loci (5)(6)(7)(8) and including only the 5 loci amplified from all popula- ; *, all loci together P < 0.05; **, all loci together P < 0.01; --, no data. See Table 2  tions, we were able to compare the results of these two treatment methods. The differentiation and mean heterozygosity (Table 2) results were not significantly different between these two methods; both recovered very similar values. The allelic richness (Table 2) and the neutrality test estimates showed a little more variance, and the effective population size estimates a large disparity between the two treatment methods, which demonstrates that this test is more sensitive and should be interpreted with caution.
Although there was variance in these estimates, the same trends were shown in both treatments.
Substantial population structure was found in Amazonia, which was undetected with more conservative nuclear markers and isozymes [9,10,24,71]. Four population clusters were detected in Amazonia, three in the Brazilian Amazon (northeastern Amazonia, southeastern Amazonia and central, and western and central Amazonia) and one including the Peruvian Amazon subpopulations, attributed to an isolation-by-distance effect. There was a moderate amount of significant differentiation and reduced gene flow between these Amazonian population clusters. The considerable differences in N e among the populations may have contributed to the observed genetic differentiation [72,73]. Within the WCA Brazil population cluster there was little genetic structure and differentiation, and the isolation-bydistance model explained nearly all of the differentiation observed [38]. Within the NEA Brazil population cluster there was no significant population structure or differentiation, likely because these three localities are 4-8 km apart and probably a single population. Within the SEAC Brazil cluster there was more structure and significant differentiation than observed for the other Amazonian clusters, which is explained by isolation-by-distance and also may be affected by the differing effective population sizes among these subpopulations. The two central Amazonian Brazil populations, PEX and PLT, were an admixture of the Amazonian clusters. PEX was primarily an admixture of SEAC and WCA Brazil populations, which are the two nearest population clusters. Interestingly, PLT shared identity primarily with the SEAC populations, which are in close proximity (although not the closest), and secondly shared identity with the Peruvian populations that are 1611-2044 km apart. BV, the northern Amazonian Brazil locality, was most similar to the southeastern Amazonian Brazil populations, which again were not the nearest. This demonstrates that their population identity was not solely based on proximity, and may be influenced by demographic history, migration, and/or ecology.
Within Peru there was no significant population structure and low differentiation among the seven subpopulations, in agreement with an earlier RAPD-PCR analysis of A. darlingi in the Peruvian Amazon that detected high homogeneity among populations (within 60 km) irrespective of different habitat types [76]. We detected little differentiation between the subpopulations even at distances up to 433 km and there was no indication of isolation-by-distance. Most of the significant low differentiation among the subpopulations occurred between samples greater than 120 km apart, except for between PCO-NAU (59 km apart, significant differentiation), PCO-PRT (134 km apart, no significant differentiation), and MAZ-PRT (147 km apart, no significant differentiation). There was a large amount of variability in N e among the Peru subpopulations (93. 6 -8) (Figure 1), which may act as a natural barrier, restricting gene flow. There was no significant differentiation between the northern Belize populations (CAV and SIB) and the Guatemalan populations, although they are separated by 257-270 km and by the mountain ranges as well. Therefore, in northern Belize and within Guatemala A. darlingi appears to be one panmictic unit. In comparison, A. albimanus populations throughout Central America displayed only minor genetic differences using microsatellites, there was weak isolation by distance, throughout Guatemala populations were genetically homogenous between Atlantic and Pacific regions and thus the Guatemalan Highlands did not appear to restrict gene flow [68]. The level of differentiation observed between GOL and the other A. darlingi Central American populations was similar to that observed between A. albimanus populations in Central and South America [68].
The data suggest that the main division within A. darlingi corresponds to Amazonia (genotype 1) and Central America (genotype 2) [10]. Earlier nuclear white, ribosomal ITS, and mitochondrial COI sequence data together established a deep divergence between genotypes 1 and 2 [10,28], interpreted as incipient species [10]. In the present study, there is marked differentiation between Central America and all four Amazonian population clusters. All pairs of genotype 1 and 2 populations showed a large amount of highly significant differentiation, there was little or no recurrent gene flow between them, they demonstrate different microsatellite allele frequencies and variation, and appear as separate clusters with the Bayesian analysis. The NJ trees based on genetic differentiation and distance both cluster the populations according to the two genotypes. The mixture of shared and private alleles in the Central America population cluster is consistent with shared ancestral polymorphism and a recent divergence between these two genotypes. The presence of a large amount of private alleles suggests some degree of independence between the gene pools [77]. The independent pairwise differentiation analyses of each locus found significant differentiation across the genome between genotypes 1 and 2. The differentiation observed between the genotypes was attributed to isolation by distance, although, as the graph shows (top right portion of Figure 3), the comparisons between Central and South American populations do not fit the positive correlation trend line, and may be a consequence of comparing diverse genetic groups that are geographically separated [11,68]. With the detection of a recent population expansion or the departure from MDE in many of the populations in Amazonia and two populations in Central America, the F ST values do not translate into meaningful rates of gene flow [79]. In the expanded populations, the migration rates will be overestimated by F ST , and the differentiation will be underestimated as compared to neutral equilibrium values. Therefore, the low level of differentiation measured within Peru and WCA Brazil may be an underestimation as well as an overestimation of gene flow; and, the differentiation and gene flow between the genotypes and population clusters may be underestimating the current degree of isolation. Despite possible departures from MDE, our large sample sizes and number of populations add statistical power to our study.

Conclusion
Overall, there was a large amount of population structure in Amazonia, and a primary division within A. darlingi between Amazonia (genotype 1) and Central America, parts of Colombia and Venezuela (genotype 2). It appears that two main factors have contributed to the genetic differentiation between the population clusters: physical distance between the populations and the differences in effective population sizes among the subpopulations. Knowledge of A. darlingi's population genetic structure is essential to an understanding of malaria epidemiology and for the success of potential genetic control strategies (release of transgenic mosquitoes refractory to Plasmodium infection) that will rely on the ability to target all populations and will require a thorough understanding of the forces that produce and maintain the population structure, especially gene flow [80]. Control strategies involving insecticides will also benefit from knowledge of gene flow, which would allow predictions about the spread of genes conferring insecticide resistance or susceptibility within and between vector populations. These control Publish with Bio Med Central and every scientist can read your work free of charge