Study site and torreya population
Our study was conducted in the pilot site of the GIAHS-ancient Chinese torreya tree system. This pilot site, which was selected by the FAO and the United Nations Development Program (UNDP) in 2013, is located in Zhejiang Province in southeastern China (29°25′–29°47′, 120°17′–120°38′; Additional file 1: Fig. S1). The area around the site is hilly and mountainous with an altitude of about 500 m. The climate is subtropical monsoon with a mean annual air temperature of 13–15 °C and a mean annual precipitation of 1300–1560 mm. The area covers 402 km2 and includes three counties, 12 towns, 59 villages, and 6.8 × 104 people. The characteristics of traditional farmers in this area (e.g. age, number of men and women, level of education) were shown in Additional file 1: Table S6.
The ancient torreya tree system in this part of China was begun by local farmers more than 2000 years ago [8]. It contains about 1.05 × 105 ancient Chinese Torreya trees. Among them, 7.2 × 104 are more than 100 years old, and thousands of them are more than 1000 years old. There are two types of torreya trees. One type is a grafted torreya (GT) tree that consists of a scion (a human-selected cultivar of Torreya grandis cv. merrillii) and a rootstock (the wild torreya of Torreya grandis). The other is the wild-type tree of Torreya grandis that is not grafted (NGT) and that is planted with GT. Local farmers have been grafting the indicated scion onto the indicated rootstocks for thousands of years [8].
Survey of the abundance, sex ratio, and age structure of torreya trees
The field survey was authorized by the local government of Shaoxing City, Zhejiang Province, China. In the collection of plant materials, we comply with the Convention on the Trade in Endangered Species of Wild Fauna and Flora (https://www.cites.org/).
The abundance, sex ratio, and age structure of GT and NGT trees were determined by designating 47 plots (50 m × 50 m) in the six sites in the main GIAHS area (Fig. 1a). Morphological difference of GT and NGT trees was distinguished by identifying the graft scar just above ground. In each plot, we measured the location, basal diameter (BD), and sex of each of each torreya tree. The sex ratio of NGT trees was calculated in each plot. The population density (number per ha) of GT and NGT trees was statistically compared with a t-test in R [33].
Age grouping based on basal diameter
Diameter at breast height (DBH) or basal diameter (BD) is often used to estimate the age of trees, especially of ancient trees [34]. Because the GT trees do not have a clear DBH, we used basal diameter (BD) to estimate the age; this is reasonable because the DBH of NGT trees is highly correlated with the BD of NGT trees (R = 0.982) (Additional file 1: Fig. S3). In estimating the age of an ancient tree, researchers usually set the initial age at 100 years [35]. In our study, we assumed that a BD of 25 cm indicated a 100-year-old tree, and four BD groups were classified as follows: (1) 25 cm < BD ≤ 65 cm (I), 65 cm < BD ≤ 105 cm (II), 105 cm < BD ≤ 145 cm (III), and 145 cm < BD ≤ 185 cm (IV). Trees were assumed to be 100–400, 400–700, 700–1000, and 1000–1300 years old in group I, II, II, and IV, respectively (Additional file 1: Table S1). In each plot, we first measured BD of each GT and NGT tree, and then assigned them to groups based on BD.
Flowering period and duration
In sites A, B, and C, we monitored the flowering period of female and male trees of GT and NGT in each plot (Fig. 1a). The flowering period of each torreya tree was recorded every day. We classified the whole flowering period of an individual tree into three stages: initial flowering stage (10–30% of the flowers open), high flowering stage (30–100% of the flowers open), and flowering completed stage (no released pollen was observed when the tree was shaken). The flowering period of individual trees was the number of days from the initial flowering stage to the flowering completed stage. The flowering duration in a plot was the number of days from when the first tree in the plot began to flower until the last tree in the plot completed flowering.
Analysis of variance (ANOVA) was conducted in R [33]. The flowering period and duration of GT trees, NGT female trees, and NGT male trees were statistically compared with one-way ANOVA in R [33].
Seed yield
At the six sites (Fig. 1a), the seed yields of GT and NGT trees were measured when farmers harvested. The yield was expressed as kilograms of fresh seed per tree. The temporal stability of seed yield by GT and NGT trees was compared with data from each sampling site for 6 years (2011–2016). The stability index (S) for each site was calculated as follows: S = μ/δ, where μ is the mean yield for a time period, and δ is its temporal standard deviation over the same time interval. Two-way ANOVA (type of tree and year) was performed on seed yield. The stability index (S) of GT and NGT trees were statistically compared by t-test after the data were square root transformed in R [33].
We also recorded the seed yield (when the farmer harvested) of each individual GT tree during 2011–2017 at site C (Fig. 1a). Each tree was assigned to one of four age groups (Additional file 1: Table S1), and yield was expressed as kilograms of fresh seed per tree. The differences in seed yield among the age groups of GT trees were assessed by one-way ANOVA in R [33].
Using the yield data from site C as described in the previous paragraph, we analyzed the stability of seed yield of each GT tree of age group III (700–1100 years old) at this site. The yield fluctuations of individual trees suggested two temporal patterns, and we divided these GT trees into two groups (A and B) according to the two patterns. The correlation of the yields of the two patterns were analyzed in R [33].
Plant material collection and DNA extraction
At each site, both leaf and root samples were randomly collected from 20 to 70 GT trees (each individual tree as one sample, the same hereafter). Leaf samples were randomly collected from 10 to 45 NGT female trees, from 7 to 22 NGT male tree, and from 11 to 26 NGT saplings. The samples were dried and preserved in allochroic silica gel before DNA extraction.
Genomic DNA was extracted from the leaf and root samples with a DNA extraction kit (E-Zup kit, TIANGEN, China) and following the manufacturer’s protocol. All DNA samples were stored at − 80 °C for further phylogenetic and microsatellite analysis.
cpDNA gene sequence and phylogenetic analysis
The universal primers of intergenic spacers of cpDNA were used to amplify the five sets of DNA samples (leaf and root samples of GT tree, leaf samples from female and male NGT trees, and leaf samples from NGT saplings) from three of the six sites (site A, B and C). The fragments were trnL-trnF and psbA-trnH [36]. PCR was carried out using standard protocols. The purified PCR products were sequenced by Sangon Biotech (Shanghai) Co., Ltd., China. Torreya taxifolia was considered as the outgroup when constructing a phylogenetic tree. The trnL-trnF and psbA-trnH sequences of cpDNA of T. taxifolia (GI: EF660642.1 and EF660702.1) were obtained from GenBank. The two isolated fragments were joined into one cpDNA dataset by SeqMan (DNASTAR, Lasergene 7.1). Phylogenetic relationships were determined using the maximum likelihood method (ML) performed in MEGA 6.0. Bootstrap test was performed at 1000 replicates.
Microsatellite genotyping
Microsatellites-primers specific for Torreya grandis were developed by using RNA-seq. Eight leaf and root samples of Torreya grandis were used for RNA extraction. The cDNA library was constructed with a cDNA Synthesis Kit and sequenced on the Illumina Hiseq™ 2000 platform of Genergy Bio-tech (Shanghai, China). Each unigene was subjected to Batchprimer3 to identify potential EST-SSRs that contained at least six repeats for dinucleotides or five repeats for tri- to hexa-nucleotides. Primer pairs were designed for 5827 SSR-containing sequences based on length (18–28 bp), annealing temperatures (48–60 °C) and GC content (40–60%). The primers designed were further subjected to OLIGO v.6.67 to rule out potential mismatch, primer dimers and hairpin structures, and examined by blasting their sequences to avoid repetition. 281 EST-SSR primers were then randomly selected and synthesized for PCR amplification, and 179 of them were able to generate amplification products of expected size and band intensity in 67 individuals from three T. grandis populations. Twenty primers displaying clear polymorphism among the three populations were finally selected (Additional file 1: Table S7).
The microsatellite polymorphism of DNA samples was analyzed by using 20 pairs of microsatellite primers and the following protocol. PCR reaction mixtures had a volume of 20 µL and contained 1× PCR Buffer, 2.0 Mm MgCl2, 0.2 mM dNTPs, primer (0.1 mM each), 1 U of Taq polymerase (TaKaRa, Japan), and 100 ng of template DNA. Cycling conditions were 95 °C for 5 min as an initial denaturation step followed by 28–37 cycles at 95 °C for 30 s, (Tm) °C for 30 s, and 72 °C for 30 s. A final extension step at 72 °C for 10 min was usually programmed as the last cycle. PCR products were separated by size with a 3730xl DNA Analyzer (Applied Biosystems, Foeter City, CA, USA) with GeneScan 500 LIZ size standard and were analyzed using GeneMarker v. 2.2.0 (SoftGenetics, Pennsylvania, USA) at Songon Biotech (Shanghai) Co., Ltd., China.
Evaluation of genetic diversity and genetic structure
Standard measures of genetic diversity were assessed for each site based on 20 loci. These measures included genotypes, mean number of alleles per locus (Na), the effective number of alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), and the indices of genetic diversity. The calculations were performed with GenALEx 6.502. One-way ANOVA was conducted to compare He and Na between the scions and rootstocks of GT trees, and among females, males, and saplings of NGT trees. The One-way ANOVA was also conducted to compare genotypes, He and Na among the four age groups of GT trees. All of these analyses were conducted in R [33]. A neighbor joining tree of GT trees of different age groups was constructed by using the neighbor joining method [37].
Farmer survey of rootstock sources
We conducted a survey of how farmers obtain rootstocks for propagating GT trees at the six sites. For these interviews, we randomly selected 30–50 farmer households from 3 to 5 villages at each site (Additional file 1: Fig. S1). During the interview, we asked three questions: (i) for grafting of GT trees, did you obtain the rootstocks from seedlings that you grew, from wild seedlings, or from NGT saplings? (ii) If you grow seedlings for production of rootstocks, did you use GT seeds, local NGT seeds, or NGT seeds from other villages? and (iii) Did you grow seedlings to provide rootstocks for the local village, to provide rootstocks for the market, or to provide NGT trees for the local village.
Analysis of genetic relationship among different age groups of the GT rootstocks and NGT trees
Using the microsatellite data, genetic distance was estimated based on Nei’s calculation [38]. Based on the genetic distance, we performed a principal coordinate analysis (PCoA) on different age groups of the rootstocks and NGT trees.
Analysis of genetic bottleneck and effective population for NGT trees and rootstocks of GT trees
Using the microsatellite datasets of NGT female and male trees and GT rootstocks from six sites, we calculated the effective population size (Ne) based on the linkage disequilibrium method implemented in NeEstimator [39]. The minimum allele frequency cutoff was set at 0.02, and a random mating model was selected.
By using the program BOTTLENECK v 1.2.02 [40], a bottleneck was identified when observed heterozygosity (Ho) was significantly greater than expected heterozygosity (He) for a population under Hardy–Weinberg equilibrium. The computation was performed under a stepwise mutation model (SMM) and a two-phased model (TPM) in which 90% of mutations followed the SMM and 10% produced multi-step mutations; the models were assessed with sign and Wilcoxon tests. A qualitative descriptor of the allele frequency distribution (mode-shift indicator) was also used to estimate whether a population experienced a genetic bottleneck. All of these estimations were based on 1000 iterations per population.
Evaluation of gene flow among NGT trees at different sites
Gene flow among recent generations of NGT trees in the six sites was calculated by using the model of [16]: \(Nm = \frac{1 - Fst}{4Fst}\), in which N is the effective population size, m is the migration rate, and Fst is a fixation index (indicating genetic differentiation among populations). A value of Nm > 1 indicates strong gene flow.
We also estimated the historical gene flow among NGT trees at the six sites by using Migrate-n based on the Bayesian coalescent method [41, 42]. Two parameters, θ (4Neμ, where μ = mutation rate) and M (m/μ, where m = migration rate), were estimated with Migrate-n. The parameters θ and M can be used to estimate the number of migrants per generation (Nm) into each population using the model of Nm = θ × M. When Nm values > 1, there is a strong gene flow.
We ran Migrate-n with the Brownian motion microsatellite model to estimate the parameters using Bayesian inference. The initial values of θ and M were calculated by FST. The uniform priors (for θ, minimum = 0.0, maximum = 0.3, delta = 0.01; for M, minimum = 0.0, maximum = 1000.0, and delta = 100.0) were then used. Four parallel chains with static heating (temperatures: 1.0, 1.5, 3.0, 100,000.0) were used. We ran each chain for 200,000 generations and sampled every 100 generations. For each chain, we used a burn-in of 20,000.
The SSR data of the four age groups (both GT and NGT trees) in three sites were used to build a neighbor joining tree using the neighbor joining method (NJ) at POPTREE 2 [37].
Pollen and seed exchange rates, and their relationship to gene flow
We surveyed farmers in the study area concerning their exchanging of pollen to obtain high seed yield and their exchanging of NGT seeds to obtain rootstocks for the grafting of GT trees We randomly interviewed 30–50 farmer households from 3 to 5 villages at each of the six sites (Additional file 1: Fig. S1). During each interview, we provided a questionnaire to the farmers. The questionnaire asked farmers about: (i) their age, sex, education, farm size and whether they are aware of their role in this torreya system; (ii) numbers of GT and NGT trees that they planted; (iii) whether they sell or give saplings and pollen to other farmers within or outside the village; (iv) whether they buy or receive seeds and pollen from other farmers within and outside the village; (v) whether they have exchanged seeds or pollen with other farmers within their village or other villages.
We then estimated the percentage of the farmers who exchanged pollen and the total numbers of NGT seeds used within the village or traded between villages. We used this percentage to indicate the seed or sapling flow rate (exchange rate) of a village.
Mantel test [43] was conducted to evaluate the correlation between flow rates of seed and pollen and gene flow across the 6 sampling sites. Contemporary and historical gene flow rate among the corresponding site were inferred from SSR from description above.