PTreeGenerator  1.0
Simple phylogenetic tree generation from multiple sequence alignment.
 All Classes Namespaces Files Functions Variables
Public Member Functions | Private Attributes | List of all members
ptreegen.distance_matrix.DistanceMatrix Class Reference

Basically a wrapper around a numpy array obejct representing the alignment distance matrix. More...

Public Member Functions

def __init__
 Takes any "matrix-like" object and tries to convert it to a numpy array.
def size
 A getter for the matrix size (number of columns/taxa).
def distMatrix
 A getter for a copy of the whole distance matrix.
def columnNames
 A getter for a list of column/taxa names.
def getSeparation
 Returns a separation of value used in the Neigbor-joining algorithm.
def getNearestNeigbors
 Finds the pair of nearest sequences.
def getDistance
 Returns the distance from one sequence to another.
def getIdx
 Finds the position of a sequence in the distance matrix.
def getName
 Finds the name of a sequence based on its position in the matrix.
def removeData
 Removes rows and columns for the specified sequences.
def appendData
 Adds a row and a column for the specified sequence.

Private Attributes

 _distMatrix
 Distance matrix as a numpy array object.
 _columnNames
 List of column names (the identification strings of the sequences).

Detailed Description

Basically a wrapper around a numpy array obejct representing the alignment distance matrix.

Performs some other additional operations usefull for tree building.

Definition at line 13 of file distance_matrix.py.

Constructor & Destructor Documentation

def ptreegen.distance_matrix.DistanceMatrix.__init__ (   self,
  matrix,
  names = None 
)

Takes any "matrix-like" object and tries to convert it to a numpy array.

Parameters
matrixa "matrix-like" object
namesoptional parameter with column and row names (the taxa names)

Definition at line 21 of file distance_matrix.py.

21 
22  def __init__(self, matrix, names=None):
23  self._distMatrix = np.array(matrix, float)
24  assert len(self._distMatrix.shape) == 2
25  assert self._distMatrix.shape[0] == self._distMatrix.shape[1]
26 
27  if not names:
28  generated_names = []
29  for i in range(self.size):
30  generated_names.append("AUTOGEN_" + str(i+1))
31  self._columnNames = list(generated_names)
32  else:
33  self._columnNames = list(names)
34 
35  assert len(self._columnNames) == len(self._distMatrix)

Member Function Documentation

def ptreegen.distance_matrix.DistanceMatrix.appendData (   self,
  data,
  name 
)

Adds a row and a column for the specified sequence.

Parameters
namethe identification of the sequence
datadata to be appended as an iterable

Definition at line 141 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._distMatrix.

142  def appendData(self, data, name):
143  arr = np.array([data], float)
144  self._distMatrix = np.append(self._distMatrix, arr, axis=0)
145  xs = []
146  for x in data:
147  xs.append([x])
148  xs.append([0])
149  arr = np.array(xs, float)
150  self._distMatrix = np.append(self._distMatrix, arr, axis=1)
151  self._columnNames.append(name)
def ptreegen.distance_matrix.DistanceMatrix.columnNames (   self)

A getter for a list of column/taxa names.

Definition at line 54 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._columnNames.

54 
55  def columnNames(self):
56  return list(self._columnNames)
def ptreegen.distance_matrix.DistanceMatrix.distMatrix (   self)

A getter for a copy of the whole distance matrix.

Definition at line 47 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._distMatrix.

47 
48  def distMatrix(self):
49  return np.array(self._distMatrix, float)
def ptreegen.distance_matrix.DistanceMatrix.getDistance (   self,
  name_i,
  name_j 
)

Returns the distance from one sequence to another.

Based on the value from the distance matrix.

Returns
one single number

Definition at line 106 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._distMatrix.

107  def getDistance(self, name_i, name_j):
108  return self._distMatrix[self._columnNames.index(name_i), self._columnNames.index(name_j)]
def ptreegen.distance_matrix.DistanceMatrix.getIdx (   self,
  name 
)

Finds the position of a sequence in the distance matrix.

Parameters
namethe identification of the sequence
Returns
index of a column in the matrix as a single number

Definition at line 114 of file distance_matrix.py.

115  def getIdx(self, name):
116  return self._columnNames.index(name)
def ptreegen.distance_matrix.DistanceMatrix.getName (   self,
  idx 
)

Finds the name of a sequence based on its position in the matrix.

Parameters
idxthe position in the matrix
Returns
index the identification of the sequence as string

Definition at line 122 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._columnNames.

123  def getName(self, idx):
124  return self._columnNames[idx]
def ptreegen.distance_matrix.DistanceMatrix.getNearestNeigbors (   self)

Finds the pair of nearest sequences.

Finds the pair of closest sequences according to the rule from the Neigbor-Joining algorithm.

Returns
tuple of size two that contains the names of two nearest sequences

Definition at line 87 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._columnNames, ptreegen.distance_matrix.DistanceMatrix._distMatrix, ptreegen.distance_matrix.DistanceMatrix.getSeparation(), and ptreegen.distance_matrix.DistanceMatrix.size().

87 
88  def getNearestNeigbors(self):
89  min_obj_value = None
90  nearest_nbrs = tuple()
91  separation = self.getSeparation()
92  for i in range(self.size):
93  for j in range(self.size):
94  if j > i:
95  obj_value = self._distMatrix[i, j] - separation[i] - separation[j]
96  if not min_obj_value or obj_value < min_obj_value:
97  min_obj_value = obj_value
98  nearest_nbrs = (self._columnNames[i], self._columnNames[j])
99  return nearest_nbrs
def ptreegen.distance_matrix.DistanceMatrix.getSeparation (   self,
  name = None 
)

Returns a separation of value used in the Neigbor-joining algorithm.

It can be computed for one sequence only (parameter name) or for all sequences (no parameter).

The separation value is computed as follows: sum(d_ik) / (L - 2), where sum(d_ik) is the sum of distances from one sequence to all the other sequences and L is the total number of sequences.

Parameters
nameidentification of one sequence
Returns
returns separation values for all sequences as a list or one value for one sequence with the specified name

Definition at line 71 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._distMatrix, and ptreegen.distance_matrix.DistanceMatrix.size().

Referenced by ptreegen.distance_matrix.DistanceMatrix.getNearestNeigbors().

71 
72  def getSeparation(self, name=None):
73  dist_sum = None
74  if name:
75  idx = self._columnNames.index(name)
76  dist_sum = self._distMatrix[idx].sum()
77  else:
78  dist_sum = self._distMatrix.sum(axis=0)
79  return dist_sum / (self.size - 2)
def ptreegen.distance_matrix.DistanceMatrix.removeData (   self,
  names 
)

Removes rows and columns for the specified sequences.

Parameters
namesthe identifications of the sequences as an iterable

Definition at line 129 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._distMatrix.

130  def removeData(self, names):
131  indices = (self._columnNames.index(x) for x in names)
132  for idx in indices:
133  self._distMatrix = np.delete(self._distMatrix, idx, axis=0)
134  self._distMatrix = np.delete(self._distMatrix, idx, axis=1)
135  self._columnNames.pop(idx)
def ptreegen.distance_matrix.DistanceMatrix.size (   self)

A getter for the matrix size (number of columns/taxa).

Definition at line 40 of file distance_matrix.py.

References ptreegen.distance_matrix.DistanceMatrix._distMatrix.

Referenced by ptreegen.distance_matrix.DistanceMatrix.getNearestNeigbors(), and ptreegen.distance_matrix.DistanceMatrix.getSeparation().

40 
41  def size(self):
42  return len(self._distMatrix)

Member Data Documentation

ptreegen.distance_matrix.DistanceMatrix._columnNames
private

List of column names (the identification strings of the sequences).

Definition at line 30 of file distance_matrix.py.

Referenced by ptreegen.distance_matrix.DistanceMatrix.columnNames(), ptreegen.distance_matrix.DistanceMatrix.getName(), and ptreegen.distance_matrix.DistanceMatrix.getNearestNeigbors().

ptreegen.distance_matrix.DistanceMatrix._distMatrix
private

Distance matrix as a numpy array object.

Definition at line 22 of file distance_matrix.py.

Referenced by ptreegen.distance_matrix.DistanceMatrix.appendData(), ptreegen.distance_matrix.DistanceMatrix.distMatrix(), ptreegen.distance_matrix.DistanceMatrix.getDistance(), ptreegen.distance_matrix.DistanceMatrix.getNearestNeigbors(), ptreegen.distance_matrix.DistanceMatrix.getSeparation(), ptreegen.distance_matrix.DistanceMatrix.removeData(), and ptreegen.distance_matrix.DistanceMatrix.size().


The documentation for this class was generated from the following file: