PyPop.Utils#
Module for common utility classes and functions.
Contains convenience classes for output of text and XML files.
Attributes#
Classes#
Output stream for writing text files. |
|
Output stream for writing XML files. |
|
Allows dict to have _ORDERED_ pairs |
|
Returns an Index object for OrderedDict |
|
StringMatrix is a subclass of NumPy (Numeric Python) |
|
Functions#
|
|
|
Return the type of stream. |
|
|
|
Gets the unique elements in a list |
|
|
|
|
|
|
|
|
|
|
|
|
|
Read user input for a filename, check its existence, continue |
|
Divides a list up into n parcels (plus whatever is left over) |
Module Contents#
- GENOTYPE_SEPARATOR = '~'#
- GENOTYPE_TERMINATOR = '~'#
- glob_with_pathlib(pattern)#
- class TextOutputStream(file)#
Output stream for writing text files.
- write(str)#
- writeln(str='\n')#
- close()#
- flush()#
- class XMLOutputStream(file)#
Bases:
TextOutputStream
Output stream for writing XML files.
- opentag(tagname, **kw)#
Generate an open XML tag.
Generate an open XML tag. Attributes are passed in the form of optional named arguments, e.g. opentag(‘tagname’, role=something, id=else) will produce the result ‘<tagname role=”something” id=”else”> Note that the attribute and values are optional and if omitted produce ‘<tagname>’.
- emptytag(tagname, **kw)#
Generate an empty XML tag.
As per ‘opentag()’ but without content, i.e.:
‘<tagname attr=”val”/>’.
- closetag(tagname)#
Generate a closing XML tag.
Generate a tag in the form: ‘</tagname>’.
- tagContents(tagname, content, **kw)#
Generate open and closing XML tags around contents.
Generates tags in the form: ‘<tagname>content</tagname>’. ‘content’ must be a string. Convert ‘&’ and ‘<’ and ‘>’ into valid XML equivalents.
- getStreamType(stream)#
Return the type of stream.
Returns either ‘xml’ or ‘text’.
- class OrderedDict(hash=None)#
Allows dict to have _ORDERED_ pairs
Creates an ordered dict
- index(key)#
Returns position of key in dict
- keys()#
Returns list of keys in dict
- values()#
Returns list of values in dict
- items()#
Returns list of tuples of keys and values
- insert(i, key, value)#
Inserts a key-value pair at a given index
- remove(i)#
Removes a key-value pair from the dict
- reverse()#
Reverses the order of the key-value pairs
- sort(cmp=0)#
Sorts the dict (allows for sort algorithm)
- clear()#
Clears all the entries in the dict
- copy()#
Makes copy of dict, also of OrderdDict class
- get(key)#
Returns the value of a key
- has_key(key)#
Looks for existence of key in dict
- update(dict)#
Updates entries in a dict based on another
- count(key)#
Finds occurrences of a key in a dict (0/1)
- class Index(i=0)#
Returns an Index object for OrderedDict
Creates an Index object for use with OrderedDict
- class StringMatrix(rowCount=None, colList=None, extraList=None, colSep='\t', headerLines=None)#
Bases:
numpy.lib.user_array.container
StringMatrix is a subclass of NumPy (Numeric Python) UserArray class, and uses NumPy to store the data in an efficient array format, rather than internal Python lists.
Constructor for StringMatrix.
colList is a mutable type so we freeze the list of locus keys in the original order in file by making a clone of the list of keys.
the order of loci in the array will correspond to the original file order, and we don’t want this tampered with by the `callee’ function (i.e. effectively override the Python ‘pass by reference’ default and ‘pass by value’).
- dump(locus=None, stream=sys.stdout)#
- copy()#
Make a (deep) copy of the StringMatrix
Currently this goes via the constructor, not sure if there is a better way of doing this
- getNewStringMatrix(key)#
Create an entirely new StringMatrix using only the columns supplied in the keys.
The format of the keys is identical to __getitem__ except that it in this case returns a full StringMatrix instance which includes all metadata
- getUniqueAlleles(key)#
Return a list of unique integers for given key sorted by allele name using natural sort
- convertToInts()#
Convert matrix to integers: needed for haplo-stats Note that integers start at 1 for compatibility with haplo-stats module FIXME: check whether we need to release memory
- countPairs()#
Given a matrix of genotypes (pairs of columns for each locus), compute number of possible pairs of haplotypes for each subject (the rows of the geno matrix)
FIXME: this does not do any involved handling of missing data as per geno.count.pairs from haplo.stats
FIXME: should these methods eventually be moved to Genotype class?
- flattenCols()#
Flatten columns into a single list FIXME: assumes entries are integers
- filterOut(key, blankDesignator)#
Returns a filtered matrix.
When passed a designator, this method will return the rows of the matrix that do not contain that designator at any rows
- getSuperType(key)#
Returns a matrix grouped by columns.
e.g if matrix is [[A01, A02, B01, B02], [A11, A12, B11, B12]]
then getSuperType(‘A:B’) will return the matrix with the column vector:
[[A01:B01, A02:B02], [A11:B11, A12:B12]]
- class Group(li, size)#
- natural_sort_key(s, _nsre=re.compile('([0-9]+)'))#
- unique_elements(li)#
Gets the unique elements in a list
- appendTo2dList(aList, appendStr=':')#
- convertLineEndings(file, mode)#
- fixForPlatform(filename, txt_ext=0)#
- copyfileCustomPlatform(src, dest, txt_ext=0)#
- copyCustomPlatform(file, dist_dir, txt_ext=0)#
- checkXSLFile(xslFilename, path='', subdir='', abort=False, debug=None, msg='')#
- getUserFilenameInput(prompt, filename)#
Read user input for a filename, check its existence, continue requesting input until a valid filename is entered.
- splitIntoNGroups(alist, n=1)#
Divides a list up into n parcels (plus whatever is left over)
This class currently works with Python 2.2, but will eventually use iterators, so ultimately will need least Python 2.3!