Distortions of Taxonomic Structure from Incomplete Data on a Restricted Set of Reference Strains Sneath, P. H. A.,, 129, 1045-1073 (1983), doi = https://doi.org/10.1099/00221287-129-4-1045, publicationName = Microbiology Society, issn = 1350-0872, abstract= The paper examines how well taxonomic relationships can be estimated when the data are restricted to the similarities between each of the strains and a small subset of reference strains. Such data represent a strip from the similarity matrix rather than the complete matrix. The methods studied were: (a) minimum spanning trees, (b) the definition of one group at a time, and (c) the calculation of ‘derived matrices’. A derived matrix is a complete matrix obtained solely from the entries of the incomplete matrix, by treating these as quantitative character states. The data used were taxonomic distances based on morphological, biochemical and physiological results, and were selected from a previous study to provide good examples of salient patterns of taxonomic relationship. The results that were most similar to those from the complete data were given by derived matrices. Surprisingly little taxonomic distortion occurred, even if the reference strains were rather few, provided these were suitably chosen. Reference strains should be well dispersed, because distortion was considerable if all were very similar to one another. Ideally there should be a reference strain from each cluster, and aids to ensuring this are discussed. The method has considerable potential for serological or nucleic acid pairing studies in which it is usually impracticable to obtain complete data on numerous strains., language=, type=