PTreeGenerator  1.0
Simple phylogenetic tree generation from multiple sequence alignment.
 All Classes Namespaces Files Functions Variables
Public Member Functions | Static Public Member Functions | Private Attributes | List of all members
ptreegen.parsimony.LargeParsimony Class Reference

Represents a solution to the large parsimony problem (the tree toplogy is unknown as oposed to the small parsimony problem). More...

Public Member Functions

def __init__
 Initializes the members used during the computation.
def tree
 A getter for the obtained consensus tree.
def cost
 A getter for the parsimony score of the computed tree.
def quartetPuzzling
 Iteratively applies the quartet puzzling heuristic to obtain intermediate tree topologies.
def getOptimalQuartets
 Finds the optimal topologies for every four taxa.

Static Public Member Functions

def getQuartetID
 Provides a unique identification of the quartet by sorting the names of the taxa in the quartet alphabetically and then concatenating their identification strings, hence allowing quartets to be identified with their respective taxa.
def getTopologyID
 Provides a string that uniquely identifies a quartet topology.
def treeFromQuartet
 Constructs a tree from the provided quartet.
def increaseCostOnPath
 Finds a path between two vertices and then increases the weight of all edges on that path by one.

Private Attributes

 _qpSteps
 The number of quartet puzzling steps to take.
 _alignment
 Reference to the multiple sequence alignment instance.
 _sequencesDict
 Associates sequence id string with a sequence instance.
 _optimalQuartets
 Associates every four taxa with their optimal quartet.
 _tree
 Reference to the tree that was computed in LargeParsimony::quartetPuzzling().

Detailed Description

Represents a solution to the large parsimony problem (the tree toplogy is unknown as oposed to the small parsimony problem).

It implements the quartet puzzling heuristic to find a set of possible tree topologies and it then builds a consensus tree employing the 50% majority rule (implemented in ptreegen::utilities::tree_utils::findConsensusTree()).

For more information on the quartet puzzling algorithm see:
Strimmer, Korbinian, and Arndt Von Haeseler. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13.7 (1996): 964-969.
or the appropriate chapter in:
Bockenhauer, H.; Bongartz, D. Algorithmic Aspects of Bioinformatics, 1st ed.; Springer: Berlin, 1998.

See Also
SmallParsimony
LargeParsimony::quartetPuzzling()

Definition at line 136 of file parsimony.py.

Constructor & Destructor Documentation

def ptreegen.parsimony.LargeParsimony.__init__ (   self,
  alignment,
  steps = 1000 
)

Initializes the members used during the computation.

It also generates the optimal quartets by first computing the parsimony cost of all possible unique quartet combinations and then selecting the quartet with minimum parsimony cost for every four taxa.

Parameters
alignmentmultiple sequence alignment
stepsnumber of possible tree topologies that will be generated

Definition at line 147 of file parsimony.py.

148  def __init__(self, alignment, steps=1000):
149  self._qpSteps = int(steps)
150  self._alignment = alignment
151  self._sequencesDict = {x.id : x.seq for x in list(self._alignment)}
152  quartet_combinations = []
153  for combination in combination_utils.uniqueCombinationsGenerator(self._sequencesDict.keys(), 4):
154  quartet_combinations.append(combination)
155  self._optimalQuartets = self.getOptimalQuartets(quartet_combinations)
156  self._tree = None

Member Function Documentation

def ptreegen.parsimony.LargeParsimony.cost (   self)

A getter for the parsimony score of the computed tree.

Returns
parsimony cost of the tree as a single value

Definition at line 176 of file parsimony.py.

References ptreegen.neighbor_joining.NeighborJoining._alignment, ptreegen.parsimony.SmallParsimony._alignment, ptreegen.parsimony.LargeParsimony._alignment, ptreegen.neighbor_joining.NeighborJoining.tree(), ptreegen.computation.Computation.tree, and ptreegen.parsimony.LargeParsimony.tree().

177  def cost(self):
178  return SmallParsimony(self.tree, self._alignment).cost
def ptreegen.parsimony.LargeParsimony.getOptimalQuartets (   self,
  quartets 
)

Finds the optimal topologies for every four taxa.

Parameters
quartetsall possible unique combinations of taxa of size four
Returns
a dictionary that associates every for taxa with the optimal topology

Definition at line 244 of file parsimony.py.

References ptreegen.neighbor_joining.NeighborJoining._alignment, ptreegen.parsimony.SmallParsimony._alignment, ptreegen.parsimony.LargeParsimony._alignment, ptreegen.parsimony.LargeParsimony.getQuartetID(), ptreegen.parsimony.LargeParsimony.getTopologyID(), and ptreegen.parsimony.LargeParsimony.treeFromQuartet().

245  def getOptimalQuartets(self, quartets):
246  optimal_quartets = dict()
247  for quartet in quartets:
248  quartet_id = self.getQuartetID(quartet)
249  assert quartet_id not in optimal_quartets
250 
251  trees = {tuple(quartet) : self.treeFromQuartet(quartet)}
252  for i in range(0,2):
253  temp = quartet[i]
254  quartet[i] = quartet[2]
255  quartet[2] = temp
256  trees[tuple(quartet)] = self.treeFromQuartet(quartet)
257 
258  min_cost = float("inf")
259  for quartet_key, tree in trees.iteritems():
260  alignment = MultipleSeqAlignment([])
261  for record in self._alignment:
262  if record.id in quartet:
263  alignment.append(record)
264  small_parsimony = SmallParsimony(tree, alignment)
265  if small_parsimony.cost < min_cost:
266  min_cost = small_parsimony.cost
267  optimal_quartets[quartet_id] = {
268  "topology" : quartet_key
269  , "topology_id" : self.getTopologyID(quartet_key)
270  }
271  return optimal_quartets
def ptreegen.parsimony.LargeParsimony.getQuartetID (   quartet)
static

Provides a unique identification of the quartet by sorting the names of the taxa in the quartet alphabetically and then concatenating their identification strings, hence allowing quartets to be identified with their respective taxa.

Note that the taxa identification strings should be unique in order to eliminate collisions.

Parameters
quartetquartet to be identified

Definition at line 287 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.getOptimalQuartets(), and ptreegen.parsimony.LargeParsimony.quartetPuzzling().

288  def getQuartetID(quartet):
289  return "".join(sorted(quartet))
def ptreegen.parsimony.LargeParsimony.getTopologyID (   quartet)
static

Provides a string that uniquely identifies a quartet topology.

Quartet topology does not depend on the order of the first two and the last two elements in the quartet. This method is needed to distinguish two quartets of the same four taxa that have different topologies.

It also allows to have the first two and last two elements of the quartet in an arbitrary order.

Parameters
quartetquartet to be identified

Definition at line 307 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.getOptimalQuartets(), and ptreegen.parsimony.LargeParsimony.quartetPuzzling().

308  def getTopologyID(quartet):
309  return LargeParsimony.getQuartetID(quartet[0:2]) + LargeParsimony.getQuartetID(quartet[2:4])
def ptreegen.parsimony.LargeParsimony.increaseCostOnPath (   tree,
  start,
  dest 
)
static

Finds a path between two vertices and then increases the weight of all edges on that path by one.

Parameters
treereference to any vertex of the processed tree
startreference to the start node
destreference to the end node

Definition at line 341 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.quartetPuzzling().

342  def increaseCostOnPath(tree, start, dest):
343  start_node = tree.search_nodes(name=start)[0]
344  end_node = tree.search_nodes(name=dest)[0]
345  node_from = dict()
346  to_visit = set()
347  visited = set()
348 
349  to_visit.add(start_node)
350  node_from[start_node] = None
351  path_found = False
352  while to_visit and not path_found:
353  current = to_visit.pop()
354  neighbors = set(current.children)
355  if current.up:
356  neighbors.add(current.up)
357  for neighbor in neighbors:
358  if neighbor not in visited:
359  node_from[neighbor] = current
360  if neighbor == end_node:
361  path_found = True
362  break
363  else:
364  to_visit.add(neighbor)
365  visited.add(current)
366 
367  node = end_node
368  while node:
369  if node.name != "Left":
370  node.dist += 1
371  node = node_from[node]
def ptreegen.parsimony.LargeParsimony.quartetPuzzling (   self,
  steps 
)

Iteratively applies the quartet puzzling heuristic to obtain intermediate tree topologies.

Implemented as described in:
Bockenhauer, H.; Bongartz, D. Algorithmic Aspects of Bioinformatics, 1st ed.; Springer: Berlin, 1998.

Parameters
stepsnumber of puzzling steps (equals the number of intermediate tree topologies)
Returns
consensus tree according to the 50% majority rule

Definition at line 190 of file parsimony.py.

References ptreegen.parsimony.LargeParsimony._optimalQuartets, ptreegen.parsimony.LargeParsimony.getQuartetID(), ptreegen.parsimony.LargeParsimony.getTopologyID(), ptreegen.parsimony.LargeParsimony.increaseCostOnPath(), and ptreegen.parsimony.LargeParsimony.treeFromQuartet().

Referenced by ptreegen.parsimony.LargeParsimony.tree().

191  def quartetPuzzling(self, steps):
192  seq_ids = self._sequencesDict.keys()
193  if len(seq_ids) < 4:
194  tree = Tree()
195  for seq_id in seq_ids:
196  tree.add_child(name=seq_id)
197  return tree
198 
199  trees = []
200  for step in range(steps):
201  shuffle(seq_ids)
202  first_quartet = self._optimalQuartets[self.getQuartetID(seq_ids[0:4])]["topology"]
203  rooted_tree = self.treeFromQuartet(first_quartet)
204  tree = rooted_tree.children[0]
205  tree.add_child(rooted_tree.children[1])
206  # tree.show()
207 
208  for i in range(4,len(seq_ids)):
209  tree_utils.initEdgeLengths(tree, 0)
210 
211  quartets = []
212  for triplet in combination_utils.combinationsGenerator(seq_ids[0:i], 3):
213  triplet.append(seq_ids[i])
214  quartets.append(tuple(triplet))
215 
216  qt_topos_found = set()
217  for quartet in quartets:
218  optimal_qt_topo_id = self._optimalQuartets[self.getQuartetID(quartet)]["topology_id"]
219  qt_topo_id = self.getTopologyID(quartet)
220  if qt_topo_id == optimal_qt_topo_id and qt_topo_id not in qt_topos_found:
221  qt_topos_found.add(qt_topo_id)
222  self.increaseCostOnPath(tree, quartet[0], quartet[1])
223 
224  # choose edge with minimum cost, delete it and add new leaf seq_ids[i]
225  shortest_edge = tree_utils.findShortestEdge(tree)
226  # new_node = Tree(name=shortest_edge[0].name + "_" + shortest_edge[1].name)
227  new_node = Tree()
228  new_node.add_child(name=seq_ids[i])
229  detached = shortest_edge[1].detach()
230  shortest_edge[0].add_child(new_node)
231  new_node.add_child(detached)
232  # tree.show()
233 
234  tree_utils.initEdgeLengths(tree, 1)
235  trees.append(tree)
236 
237  # find consensus tree
238  return tree_utils.findConsensusTree(trees)
def ptreegen.parsimony.LargeParsimony.tree (   self)

A getter for the obtained consensus tree.

Returns
reference to the tree instance

Definition at line 162 of file parsimony.py.

References ptreegen.parsimony.LargeParsimony._qpSteps, ptreegen.parsimony.SmallParsimony._tree, ptreegen.parsimony.LargeParsimony._tree, and ptreegen.parsimony.LargeParsimony.quartetPuzzling().

Referenced by ptreegen.parsimony.LargeParsimony.cost().

163  def tree(self):
164  if self._tree:
165  return self._tree
166  else:
167  self._tree = self.quartetPuzzling(self._qpSteps)
168  self._tree.get_tree_root().unroot()
169  return self._tree
def ptreegen.parsimony.LargeParsimony.treeFromQuartet (   quartet)
static

Constructs a tree from the provided quartet.

The tree is rooted so that it can be used in the Fitch's algorithm for solving the small parsimony problem.

Parameters
quartetquartet to be transformed
Returns
a reference to the root of the tree

Definition at line 319 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.getOptimalQuartets(), and ptreegen.parsimony.LargeParsimony.quartetPuzzling().

320  def treeFromQuartet(quartet):
321  root = Tree()
322  root.name = "root"
323  left = root.add_child(name="Left")
324  left.add_child(name=quartet[0])
325  left.add_child(name=quartet[1])
326  right = root.add_child(name="Right")
327  right.add_child(name=quartet[2])
328  right.add_child(name=quartet[3])
329  for desc in root.iter_descendants():
330  desc.dist = 0
331  return root

Member Data Documentation

ptreegen.parsimony.LargeParsimony._alignment
private

Reference to the multiple sequence alignment instance.

Definition at line 149 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.cost(), and ptreegen.parsimony.LargeParsimony.getOptimalQuartets().

ptreegen.parsimony.LargeParsimony._optimalQuartets
private

Associates every four taxa with their optimal quartet.

Definition at line 154 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.quartetPuzzling().

ptreegen.parsimony.LargeParsimony._qpSteps
private

The number of quartet puzzling steps to take.

Definition at line 148 of file parsimony.py.

Referenced by ptreegen.parsimony.LargeParsimony.tree().

ptreegen.parsimony.LargeParsimony._sequencesDict
private

Associates sequence id string with a sequence instance.

Definition at line 150 of file parsimony.py.

ptreegen.parsimony.LargeParsimony._tree
private

Reference to the tree that was computed in LargeParsimony::quartetPuzzling().

Definition at line 155 of file parsimony.py.

Referenced by ptreegen.visualization.Visualization.printToStdout(), and ptreegen.parsimony.LargeParsimony.tree().


The documentation for this class was generated from the following file: