maximum parsimony example - Search
Open links in new tab
  1. Maximum parsimony (phylogenetics) - Wikipedia

    The input data used in a maximum parsimony analysis is in the form of "characters" for a range of taxa. There is no generally agreed-upon definition of a phylogenetic character, but operationally a character can be thought of as an attribute, an axis along which taxa are observed to vary. These attributes can be physical (morphological), molecular, genetic, physiological, or behavioral. The only widespread agreement on characters seems to be that varia…

    The input data used in a maximum parsimony analysis is in the form of "characters" for a range of taxa. There is no generally agreed-upon definition of a phylogenetic character, but operationally a character can be thought of as an attribute, an axis along which taxa are observed to vary. These attributes can be physical (morphological), molecular, genetic, physiological, or behavioral. The only widespread agreement on characters seems to be that variation used for character analysis should reflect heritable variation. Whether it must be directly heritable, or whether indirect inheritance (e.g., learned behaviors) is acceptable, is not entirely resolved.

    Each character is divided into discrete character states, into which the variations observed are classified. Character states are often formulated as descriptors, describing the condition of the character substrate. For example, the character "eye color" might have the states "blue" and "brown." Characters can have two or more states (they can have only one, but these characters lend nothing to a maximum parsimony analysis, and are often excluded).

    Coding characters for phylogenetic analysis is not an exact science, and there are numerous complicating issues. Typically, taxa are scored with the same state if they are more similar to one another in that particular attribute than ea…

    Read more on Wikipedia


    In phylogenetics and computational phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes (or minimizes the cost of differentially weighted character-state changes). Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy (i.e., convergent evolution, parallel evolution, and evolutionary reversals). In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

    Maximum parsimony is an intuitive and simple criterion, and it is popular for this reason. However, although it is easy to score a phylogenetic tree (by counting the number of character-state changes), there is no algorithm to quickly generate the most-parsimonious tree. Instead, the most-parsimonious tree must be sought in "tree space" (i.e., amongst all possible trees). For a small number of taxa (i.e., fewer than nine) it is possible to do an exhaustive search, in which every possible tree is scored, and the best one is selected. For nine to twenty taxa, it will generally be preferable to use branch-and-bound, which is also guaranteed to return the best tree. For greater numbers of taxa, a heuristic search must be performed.

    Because the most-parsimonious tree is always the shortest possible tree, this means that—in comparison to a hypothetical "true" tree that actually describes the unknown evolutionary history of the organisms under study—the "best" tree according to the maximum-parsimony criterion will often underestimate the actual evolutionary change that could have occurred. In addition, maximum parsimony is not statistically consistent. That is, it is not guaranteed to produce the true tree with high probability, given sufficient data. As demonstrated in 1978 by Joe Felsenstein, maximum parsimony can be inconsistent under certain conditions, such as long-branch attraction. Of course, any phylogenetic algorithm could also be statistically inconsistent if the model it employs to estimate the preferred tree does not accurately match the way that evolution occurred in that clade. This is unknowable. Therefore, while statistical consistency is an interesting theoretical property, it lies outside the realm of testability, and is irrelevant to empirical phylogenetic studies.

    Continue reading

    In phylogenetics, parsimony is mostly interpreted as favoring the trees that minimize the amount of evolutionary change required (see for example ). Alternatively, phylogenetic parsimony can be characterized as favoring the trees that maximize explanatory power by minimizing the number of observed similarities that cannot be explained by inheritance and common descent. Minimization of required evolutionary change on the one hand and maximization of observed similarities that can be explained as homology on the other may result in different preferred trees when some observed features are not applicable in some groups that are included in the tree, and the latter can be seen as the more general approach.

    While evolution is not an inherently parsimonious process, centuries of scientific experience lend support to the aforementioned principle of parsimony (Occam's razor). Namely, the supposition of a simpler, more parsimonious chain of events is preferable to the supposition of a more complicated, less parsimonious chain of events. Hence, parsimony (sensu lato) is typically sought in inferring phylogenetic trees, and in scientific explanation generally.

    Continue reading

    Parsimony is part of a class of character-based tree estimation methods which use a matrix of discrete phylogenetic characters and character states to infer one or more optimal phylogenetic trees for a set of taxa, commonly a set of species or reproductively isolated populations of a single species. These methods operate by evaluating candidate phylogenetic trees according to an explicit optimality criterion; the tree with the most favorable score is taken as the best hypothesis of the phylogenetic relationships of the included taxa. Maximum parsimony is used with most kinds of phylogenetic data; until recently, it was the only widely used character-based tree estimation method used for morphological data.

    Inferring phylogenies is not a trivial problem. A huge number of possible phylogenetic trees exist for any reasonably sized set of taxa; for example, a mere ten species gives over two million possible unrooted trees. These possibilities must be searched to find a tree that best fits the data according to the optimality criterion. However, the data themselves do not lead to a simple, arithmetic solution to the problem. Ideally, we would expect the distribution of whatever evolutionary characters (such as phenotypic traits or alleles) to directly follow the branching pattern of evolution. Thus we could say that if two organisms possess a shared character, they should be more closely related to each other than to a third organism that lacks this character (provided that character was not present in the last common ancestor of all three, in which case it would be a symplesiomorphy). We would predict that bats and monkeys are more closely related to each other than either is to an elephant, because male bats and monkeys possess external testicles, which elephants lack. However, we cannot say that bats and monkeys are more closely related to one another than they are to whales, though the two have external testicles absent in whales, because we believe that the males in the last common ancestral species of the three had external testicles.

    However, the phenomena of convergent evolution, parallel evolution, and evolutionary reversals (collectively termed homoplasy) add an unpleasant wrinkle to the problem of inferring phylogeny. For a number of reasons, two organisms can possess a trait inferred to have not been present in their last common ancestor: If we naively took the presence of this trait as evidence of a relationship, we would infer an incorrect tree. Empirical phylogenetic data may include substantial homoplasy, with different parts of the data suggesting sometimes very different relationships. Methods used to estimate phylogenetic trees are explicitly intended to resolve the conflict within the data by picking the phylogenetic tree that is the best fit to all the data overall, accepting that some data simply will not fit. It is often mistakenly believed that parsimony assumes that convergence is rare; in fact, even convergently derived characters have some value in maximum-parsimony-based phylogenetic analyses, and the prevalence of convergence does not systematically affect the outcome of parsimony-based methods.

    Data that do not fit a tree perfectly are not simply "noise", they can contain relevant phylogenetic signal in some parts of a tree, even if they conflict with the tree overall. In the whale example given above, the lack of external testicles in whales is homoplastic: It reflects a return to the condition inferred to have been present in ancient ancestors of mammals, whose testicles were internal. This …

    Read more on Wikipedia

    Continue reading

    The time required for a parsimony analysis (or any phylogenetic analysis) is proportional to the number of taxa (and characters) included in the analysis. Also, because more taxa require more branches to be estimated, more uncertainty may be expected in large analyses. Because data collection costs in time and money often scale directly with the number of taxa included, most analyses include only a fraction of the taxa that could have been sampled. Indeed, some authors have contended that four taxa (the minimum required to produce a meaningful unrooted tree) are all that is necessary for accurate phylogenetic analysis, and that more characters are more valuable than more taxa in phylogenetics. This has led to a raging controversy about taxon sampling.

    Empirical, theoretical, and simulation studies have led to a number of dramatic demonstrations of the importance of adequate taxon sampling. Most of these can be summarized by a simple observation: a phylogenetic data matrix has dimensions of characters times taxa. Doubling the number of taxa doubles the amount of information in a matrix just as surely as doubling the number of characters. Each taxon represents a new sample for every character, but, more importantly, it (usually) represents a new combination of character states. These character states can not only determine where that taxon is placed on the tree, they can inform the entire analysis, possibly causing different relationships among the remaining taxa to be favored by changing estimates of the pattern of character changes.

    The most disturbing weakness of parsimony analysis, that of long-branch attraction (see below) is particularly pronounced with poor taxon sampling, especially in the four-taxon case. This is a well-understood case in which additional character sampling may not improve the quality of the estimate. As taxa are added, they often break up long branches (especially in the case of fossils), effectively improving the estimation of character state changes along them. Because of the richness of information added by taxon sampling, it is even possible to produce highly accurate estimates of phylogenies with hundreds of taxa using only a few thousand characters.

    Although many studies have been performed, there is still much work to be done on taxon sampling strategies. Because of advances in computer performance, and the reduced cost and increased automation of molecular sequencing, sample sizes overall are on the rise, and studies addressing the relationships of hundreds of taxa (or other terminal entities, such as genes) are becoming common. Of course, this is not to say that adding characters is not also useful; the number of characters is increasing as well.

    Some systematists prefer to exclude taxa based on the number of unknown character entries ("?") they exhibit, or because they tend to "jump around" the tree in analyses (i.e., they are "wildcards"). As noted below, theoretical and simulation work has demonstrated that this is likely to sacrifice accuracy rather than improve it. Although these taxa may generate more most-parsimonious trees (see below), methods such as agreement subtrees and reduced consensus can still extract information on the relationships of interest.

    It …

    Read more on Wikipedia

    Continue reading
    Kizdar net | Kizdar net | Кыздар Нет
  1. Some results have been removed