PTreeGenerator  1.0
Simple phylogenetic tree generation from multiple sequence alignment.
 All Classes Namespaces Files Functions Variables
Public Member Functions | Public Attributes | List of all members
ptreegen.computation.Computation Class Reference

Parses user specified options and delegetes appropriate actions to other modules. More...

Public Member Functions

def __init__
 Constructor initializes the object variables and calls other modules to do so.
def parseOptions
 Parses the options passed to the constructor.
def checkAlignment
 Checks the input alignment for duplicate taxa id strings.
def computeTree
 Method that delegates tree computation to the appropriate module.
def update
 Method that can be used to update the results, if the Computation class changes.
def computeDistanceMatrix
 Computes the distance matrix from the alignment.
def cleanAlignment
 Method responsible for the correct alignment cleaning.
def showResults

Public Attributes

 algorithm
 The methodology used to build the tree as one of those in ptreegen::enums.
 gapPenalty
 Cost of gaps when they are included in the distance computation.
 includeGaps
 Specifies if columns with gaps should be left in the alignment or deleted.
 removePoor
 Specifies if poorly conserved regions should be removed from the alignment.
 gapCutoff
 If at least x% in column are not gaps, then the column is left in the alignment.
 pairCutoff
 If at least x% of pairs in column are identical, then the column is left in the alignment.
 seqType
 The type of the input sequence as one of ptreegen::enums.
 distFunction
 Pointer to function used to compute distances between a two sequences.
 parsIterCnt
 options
 Reference to the dictionary like object.
 alignment
 Reference to an object representing the multiple sequence alignment.
 distanceMatrix
 The distance matrix for the alignment.
 tree
 The generated tree.
 visualization

Detailed Description

Parses user specified options and delegetes appropriate actions to other modules.

Also serves as a data storage of computed results.

Definition at line 24 of file computation.py.

Constructor & Destructor Documentation

def ptreegen.computation.Computation.__init__ (   self,
  options 
)

Constructor initializes the object variables and calls other modules to do so.

Parameters
optionscomputation options in the form of a dictionary-like object

Definition at line 31 of file computation.py.

31 
32  def __init__(self, options):
33  self.algorithm = None
34  self.gapPenalty = None
35  self.includeGaps = None
36  self.removePoor = None
37  self.gapCutoff = None
38  self.pairCutoff = None
39  self.seqType = None
40  self.distFunction = None
41  self.parsIterCnt = None
42  self.options = options
43  self.parseOptions(self.options)
44  self.alignment = self.cleanAlignment(self.alignment)
46  self.distanceMatrix = None
47  self.tree = self.computeTree()

Member Function Documentation

def ptreegen.computation.Computation.checkAlignment (   self)

Checks the input alignment for duplicate taxa id strings.

Definition at line 89 of file computation.py.

References ptreegen.computation.Computation.alignment.

89 
90  def checkAlignment(self):
91  found = set()
92  for x in self.alignment:
93  if x.id in found:
94  raise RuntimeError("Duplicate taxa identification strings found: " + x.id)
95  else:
96  found.add(x.id)
def ptreegen.computation.Computation.cleanAlignment (   self,
  alignment 
)

Method responsible for the correct alignment cleaning.

It removes badly conserved regions and/or regions with too many gaps, if requested by the user.

Parameters
alignmentthe mutliple sequence alignment instance

Definition at line 150 of file computation.py.

References ptreegen.computation.Computation.gapCutoff, ptreegen.computation.Computation.includeGaps, ptreegen.computation.Computation.pairCutoff, and ptreegen.computation.Computation.removePoor.

Referenced by ptreegen.computation.Computation.update().

151  def cleanAlignment(self, alignment):
152  to_remove = set()
153  for col_idx in range(alignment.get_alignment_length()):
154  column = alignment[:,col_idx]
155 
156  # removal of columns with gaps
157  if not self.includeGaps:
158  if column.find("-") != -1:
159  to_remove.add(col_idx)
160 
161  # poorly conserved regions removal
162  if self.removePoor and (col_idx not in to_remove):
163  # remove column if it contains too many gaps
164  nongap_ratio = float(len(column) - column.count("-")) / len(column)
165  if nongap_ratio < self.gapCutoff:
166  to_remove.add(col_idx)
167  continue
168 
169  # remove column if there are not enough identical residues
170  all_pairs = (len(column) - column.count("-")) * ((len(column) - column.count("-")) - 1) / 2.0
171  identical_pairs = 0
172  for res in set(column):
173  if res != "-":
174  res_count = column.count(res)
175  if res_count > 1:
176  identical_pairs += res_count * (res_count - 1) / 2.0
177  identical_pairs_ratio = identical_pairs / all_pairs
178  if identical_pairs_ratio < self.pairCutoff:
179  to_remove.add(col_idx)
180  continue
181 
182  cleaned_alignment = MultipleSeqAlignment([])
183  for record in alignment:
184  seq = record.seq.tomutable()
185  counter = 0
186  for idx in to_remove:
187  seq.pop(idx - counter)
188  counter+=1
189  cleaned_alignment.append(SeqRecord(seq.toseq(), id=record.id))
190 
191  return cleaned_alignment
def ptreegen.computation.Computation.computeDistanceMatrix (   self,
  alignment,
  distFunction 
)

Computes the distance matrix from the alignment.

Parameters
alignmentthe mutliple sequence alignment instance
distFunctionthe distance measure used, can be one of the functions in ptreegen::distance_functions.
Returns
distance matrix as a tuple

Definition at line 127 of file computation.py.

References ptreegen.computation.Computation.distFunction, and ptreegen.computation.Computation.options.

Referenced by ptreegen.computation.Computation.computeTree(), and ptreegen.computation.Computation.update().

128  def computeDistanceMatrix(self, alignment, distFunction):
129  dist_matrix = []
130  for i,record_i in enumerate(alignment):
131  seq_i = record_i.seq
132  distances = []
133  for j,record_j in enumerate(alignment):
134  if j > i:
135  seq_j = record_j.seq
136  distances.append(distFunction(seq_i, seq_j, **self.options))
137  elif i == j:
138  distances.append(0)
139  else:
140  distances.append(dist_matrix[j][i])
141  dist_matrix.append(tuple(distances))
142  return tuple(dist_matrix)
def ptreegen.computation.Computation.computeTree (   self)

Method that delegates tree computation to the appropriate module.

Returns
reference to the computed tree object

Definition at line 102 of file computation.py.

References ptreegen.computation.Computation.algorithm, ptreegen.computation.Computation.alignment, ptreegen.computation.Computation.computeDistanceMatrix(), ptreegen.computation.Computation.distanceMatrix, ptreegen.computation.Computation.distFunction, and ptreegen.computation.Computation.parsIterCnt.

Referenced by ptreegen.computation.Computation.update().

103  def computeTree(self):
104  if self.algorithm == TreeBuildAlgorithms.NJ:
106  return NeighborJoining(self.distanceMatrix, self.alignment).tree
107  elif self.algorithm == TreeBuildAlgorithms.PARSIMONY:
108  return LargeParsimony(self.alignment, steps=self.parsIterCnt).tree
109  else:
110  raise RuntimeError(self.algorithm + " not implemented.")
def ptreegen.computation.Computation.parseOptions (   self,
  options 
)

Parses the options passed to the constructor.

Parameters
optionscomputation options in the form of a dictionary-like object

Definition at line 53 of file computation.py.

References ptreegen.computation.Computation.algorithm, ptreegen.computation.Computation.alignment, ptreegen.computation.Computation.distFunction, ptreegen.computation.Computation.gapCutoff, ptreegen.computation.Computation.gapPenalty, ptreegen.computation.Computation.includeGaps, ptreegen.computation.Computation.pairCutoff, ptreegen.computation.Computation.parsIterCnt, ptreegen.computation.Computation.removePoor, and ptreegen.computation.Computation.seqType.

53 
54  def parseOptions(self, options):
55  self.algorithm = options["method"]
56  if self.algorithm not in (TreeBuildAlgorithms.NJ, TreeBuildAlgorithms.PARSIMONY):
57  raise RuntimeError("Unknown method: " + self.algorithm)
58  self.parsIterCnt = options["pars_tree_count"]
59  self.gapPenalty = options["gap_penalty"]
60  if self.gapPenalty < 0 or self.gapPenalty > 1:
61  raise RuntimeError("Bad gap penalty value. Must be between 0 and 1. Got: " + self.gapPenalty)
62  self.includeGaps = not options["no_gaps"]
63  self.removePoor = not options["no_cleaning"]
64  self.gapCutoff = options["gap_cutoff"]
65  if self.gapCutoff < 0 or self.gapCutoff > 1:
66  raise RuntimeError("Bad gap cutoff value. Must be between 0 and 1. Got: " + self.gapCutoff)
67  self.pairCutoff = options["pair_cutoff"]
68  self.seqType = options["sequence_type"]
69  if self.seqType == SeqTypes.AA:
70  self.alignment = AlignIO.read(options["alignment_file"], "fasta", alphabet=generic_protein)
71  elif self.seqType == SeqTypes.DNA:
72  self.alignment = AlignIO.read(options["alignment_file"], "fasta", alphabet=generic_dna)
73  elif self.seqType == SeqTypes.RNA:
74  self.alignment = AlignIO.read(options["alignment_file"], "fasta", alphabet=generic_rna)
75  else:
76  raise RuntimeError("Unknown sequence type: " + self.seqType)
77  if options["dist_measure"] == DistMeasures.P_DISTANCE:
78  self.distFunction = dfuncs.p_distance
79  elif options["dist_measure"] == DistMeasures.POISSON_CORRECTED:
80  self.distFunction = dfuncs.poisson_corrected
81  elif options["dist_measure"] == DistMeasures.JUKES_CANTOR:
82  self.distFunction = dfuncs.jukes_cantor
83  else:
84  raise RuntimeError("Unknown distance measure: " + options["dist_measure"])
def ptreegen.computation.Computation.showResults (   self)

Definition at line 192 of file computation.py.

193  def showResults(self):
194  self.visualization.show()
def ptreegen.computation.Computation.update (   self)

Method that can be used to update the results, if the Computation class changes.

Definition at line 115 of file computation.py.

References ptreegen.computation.Computation.alignment, ptreegen.computation.Computation.cleanAlignment(), ptreegen.computation.Computation.computeDistanceMatrix(), ptreegen.computation.Computation.computeTree(), ptreegen.computation.Computation.distanceMatrix, ptreegen.computation.Computation.distFunction, and ptreegen.computation.Computation.tree.

116  def update(self):
117  self.alignment = self.cleanAlignment(self.alignment)
119  self.tree = self.computeTree()

Member Data Documentation

ptreegen.computation.Computation.algorithm

The methodology used to build the tree as one of those in ptreegen::enums.

Definition at line 32 of file computation.py.

Referenced by ptreegen.computation.Computation.computeTree(), and ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.alignment

Reference to an object representing the multiple sequence alignment.

Definition at line 43 of file computation.py.

Referenced by ptreegen.computation.Computation.checkAlignment(), ptreegen.computation.Computation.computeTree(), ptreegen.computation.Computation.parseOptions(), and ptreegen.computation.Computation.update().

ptreegen.computation.Computation.distanceMatrix

The distance matrix for the alignment.

Definition at line 45 of file computation.py.

Referenced by ptreegen.computation.Computation.computeTree(), and ptreegen.computation.Computation.update().

ptreegen.computation.Computation.distFunction

Pointer to function used to compute distances between a two sequences.

Definition at line 39 of file computation.py.

Referenced by ptreegen.computation.Computation.computeDistanceMatrix(), ptreegen.computation.Computation.computeTree(), ptreegen.computation.Computation.parseOptions(), and ptreegen.computation.Computation.update().

ptreegen.computation.Computation.gapCutoff

If at least x% in column are not gaps, then the column is left in the alignment.

Definition at line 36 of file computation.py.

Referenced by ptreegen.computation.Computation.cleanAlignment(), and ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.gapPenalty

Cost of gaps when they are included in the distance computation.

Definition at line 33 of file computation.py.

Referenced by ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.includeGaps

Specifies if columns with gaps should be left in the alignment or deleted.

Definition at line 34 of file computation.py.

Referenced by ptreegen.computation.Computation.cleanAlignment(), and ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.options

Reference to the dictionary like object.

Definition at line 41 of file computation.py.

Referenced by ptreegen.computation.Computation.computeDistanceMatrix().

ptreegen.computation.Computation.pairCutoff

If at least x% of pairs in column are identical, then the column is left in the alignment.

Definition at line 37 of file computation.py.

Referenced by ptreegen.computation.Computation.cleanAlignment(), and ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.parsIterCnt

Definition at line 40 of file computation.py.

Referenced by ptreegen.computation.Computation.computeTree(), and ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.removePoor

Specifies if poorly conserved regions should be removed from the alignment.

Definition at line 35 of file computation.py.

Referenced by ptreegen.computation.Computation.cleanAlignment(), and ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.seqType

The type of the input sequence as one of ptreegen::enums.

Definition at line 38 of file computation.py.

Referenced by ptreegen.computation.Computation.parseOptions().

ptreegen.computation.Computation.tree

The generated tree.

Definition at line 46 of file computation.py.

Referenced by ptreegen.parsimony.LargeParsimony.cost(), and ptreegen.computation.Computation.update().

ptreegen.computation.Computation.visualization

Definition at line 47 of file computation.py.


The documentation for this class was generated from the following file: