Among available genome relatedness indices, average nucleotide identity (ANI) is one of the most robust measurements of genomic relatedness between strains, and has great potential in the taxonomy of bacteria and archaea as a substitute for the labour-intensive DNA–DNA hybridization (DDH) technique. An ANI threshold range (95–96 %) for species demarcation had previously been suggested based on comparative investigation between DDH and ANI values, albeit with rather limited datasets. Furthermore, its generality was not tested on all lineages of prokaryotes. Here, we investigated the overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla to see whether the suggested range can be applied to all species. There was an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95–96 % ANI. We went on to determine which level of 16S rRNA gene sequence similarity corresponds to the currently accepted ANI threshold for species demarcation using over one million comparisons. A twofold cross-validation statistical test revealed that 98.65 % 16S rRNA gene sequence similarity can be used as the threshold for differentiating two species, which is consistent with previous suggestions (98.2–99.0 %) derived from comparative studies between DDH and 16S rRNA gene sequence similarity. Our findings should be useful in accelerating the use of genomic sequence data in the taxonomy of bacteria and archaea.
The G+C content of a genome is frequently used in taxonomic descriptions of species and genera. In the past it has been determined using conventional, indirect methods, but it is nowadays reasonable to calculate the DNA G+C content directly from the increasingly available and affordable genome sequences. The expected increase in accuracy, however, might alter the way in which the G+C content is used for drawing taxonomic conclusions. We here re-estimate the literature assumption that the G+C content can vary up to 3–5 % within species using genomic datasets. The resulting G+C content differences are compared with DNA–DNA hybridization (DDH) similarities calculated in silico using the GGDC web server, with 70 % similarity as the gold standard threshold for species boundaries. The results indicate that the G+C content, if computed from genome sequences, varies no more than 1 % within species. Statistical models based on larger differences alone can reject the hypothesis that two strains belong to the same species. Because DDH similarities between two non-type strains occur in the genomic datasets, we also examine to what extent and under which conditions such a similarity could be <70 % even though the similarity of either strain to a type strain was ≥70 %. In theory, their similarity could be as low as 50 %, whereas empirical data suggest a boundary closer (but not identical) to 70 %. However, it is shown that using a 50 % boundary would not affect the conclusions regarding the DNA G+C content. Hence, we suggest that discrepancies between G+C content data provided in species descriptions on the one hand and those recalculated after genome sequencing on the other hand ≥1 % are due to significant inaccuracies of the applied conventional methods and accordingly call for emendations of species descriptions.
Vibrios are ubiquitous in the aquatic environment and can be found in association with animal or plant hosts. The range of ecological relationships includes pathogenic and mutualistic associations. To gain a better understanding of the ecology of these microbes, it is important to determine their phenotypic features. However, the traditional phenotypic characterization of vibrios has been expensive, time-consuming and restricted in scope to a limited number of features. In addition, most of the commercial systems applied for phenotypic characterization cannot characterize the broad spectrum of environmental strains. A reliable and possible alternative is to obtain phenotypic information directly from whole genome sequences. The aim of the present study was to evaluate the usefulness of whole genome sequences as a source of phenotypic information. We performed a comparison of the vibrio phenotypes obtained from the literature with the phenotypes obtained from whole genome sequences. We observed a significant correlation between the previously published phenotypic data and the phenotypic data retrieved from whole genome sequences of vibrios. Analysis of 26 vibrio genomes revealed that all genes coding for the specific proteins involved in the metabolic pathways responsible for positive phenotypes of the 14 diagnostic features (Voges–Proskauer reaction, indole production, arginine dihydrolase, ornithine decarboxylase, utilization of myo-inositol, sucrose and l-leucine, and fermentation of d-mannitol, d-sorbitol, l-arabinose, trehalose, cellobiose, d-mannose and d-galactose) were found in the majority of the vibrios genomes. Vibrio species that were negative for a given phenotype revealed the absence of all or several genes involved in the respective biochemical pathways, indicating the utility of this approach to characterize the phenotypes of vibrios. The absence of the global regulation and regulatory proteins in the Vibrio parahaemolyticus genome indicated a non-vibrio phenotype. Whole genome sequences represent an important source for the phenotypic identification of vibrios.
Genome sequences are enabling applications of different approaches to more clearly understand microbial phylogeny and systematics. Two of these approaches involve identification of conserved signature indels (CSIs) and conserved signature proteins (CSPs) that are specific for different lineages. These molecular markers provide novel and more definitive means for demarcation of prokaryotic taxa and for identification of species from these groups. Genome sequences are also enabling determination of phylogenetic relationships among species based upon sequences for multiple proteins. In this work, we have used all of these approaches for studying the phytopathogenic bacteria belonging to the genera Dickeya , Pectobacterium and Brenneria . Members of these genera, which cause numerous diseases in important food crops and ornamental plants, are presently distinguished mainly on the basis of their branching in phylogenetic trees. No biochemical or molecular characteristic is known that is uniquely shared by species from these genera. Hence, detailed studies using the above approaches were carried out on proteins from the genomes of these bacteria to identify molecular markers that are specific for them. In phylogenetic trees based upon concatenated sequences for 23 conserved proteins, members of the genera Dickeya , Pectobacterium and Brenneria formed a strongly supported clade within the other Enterobacteriales . Comparative analysis of protein sequences from the Dickeya , Pectobacterium and Brenneria genomes has identified 10 CSIs and five CSPs that are either uniquely or largely found in all genome-sequenced species from these genera, but not present in any other bacteria in the database. In addition, our analyses have identified 10 CSIs and 17 CSPs that are specifically present in either all or most sequenced Dickeya species/strains, and six CSIs and 19 CSPs that are uniquely found in the sequenced Pectobacterium genomes. Finally, our analysis also identified three CSIs and one CSP that are specifically shared by members of the genera Pectobacterium and Brenneria , but absent in species of the genus Dickeya , indicating that the former two genera shared a common ancestor exclusive of Dickeya . The identified CSIs and CSPs provide novel tools for identification of members of the genera Dickeya and Pectobacterium and for delimiting these taxa in molecular terms. Descriptions of the genera Dickeya and Pectobacterium have been revised to provide information for these molecular markers. Biochemical studies on these CSIs and CSPs, which are specific for these genera, may lead to discovery of novel properties that are unique to these bacteria and which could be targeted to develop antibacterial agents that are specific for these plant-pathogenic bacteria.