PyPop.Haplo#
Module for estimating haplotypes.
Classes#
Abstract base class for haplotype parsing/output. |
|
Haplotype estimation implemented via Arlequin |
|
Haplotype and LD estimation implemented via emhaplofreq. |
|
Haplotype and LD estimation implemented via haplo-stats. |
Module Contents#
- class Haplo#
Abstract base class for haplotype parsing/output.
Currently a stub class (unimplemented).
- class HaploArlequin(arpFilename, idCol, prefixCols, suffixCols, windowSize, mapOrder=None, untypedAllele='0', arlequinPrefix='arl_run', debug=0)#
Bases:
Haplo
Haplotype estimation implemented via Arlequin
Outputs Arlequin format data files and runtime info, also runs and parses the resulting Arlequin data so it can be made available programmatically to rest of Python framework.
Delegates all calls Arlequin to an internally instantiated ArlequinBatch Python object called ‘batch’.
Constructor for HaploArlequin object.
Expects:
arpFilename: Arlequin filename (must have ‘.arp’ file extension)
idCol: column in input file that contains the individual id.
prefixCols: number of columns to ignore before allele data starts
suffixCols: number of columns to ignore after allele data stops
windowSize: size of sliding window
mapOrder: list order of columns if different to column order in file (defaults to order in file)
untypedAllele: (defaults to ‘0’)
arlequinPrefix: prefix for all Arlequin run-time files
(defaults to ‘arl_run’).
debug: (defaults to 0)
- outputArlequin(data)#
Outputs the specified .arp sample file.
- runArlequin()#
Run the Arlequin haplotyping program.
Generates the expected ‘.txt’ set-up files for Arlequin, then forks a copy of ‘arlecore.exe’, which must be on ‘PATH’ to actually generate the haplotype estimates from the generated ‘.arp’ file.
- genHaplotypes()#
Gets the haplotype estimates back from Arlequin.
Parses the Arlequin output to retrieve the haplotype estimated data. Returns a list of the sliding `windows’ which consists of tuples.
Each tuple consists of a:
dictionary entry (the haplotype-frequency) key-value pairs.
population name (original ‘.arp’ file prefix)
sample count (number of samples for that window)
ordered list of loci considered
- class Emhaplofreq(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#
Bases:
Haplo
Haplotype and LD estimation implemented via emhaplofreq.
This is essentially a wrapper to a Python extension built on top of the ‘emhaplofreq’ command-line program.
Will refuse to estimate haplotypes longer than that defined by ‘emhaplofreq’.
- serializeStart()#
Serialize start of XML output to XML stream
- serializeEnd()#
Serialize end of XML output to XML stream
- estHaplotypes(locusKeys=None, numInitCond=None)#
Estimate haplotypes for listed groups in ‘locusKeys’.
Format of ‘locusKeys’ is a string consisting of:
comma (‘,’) separated haplotypes blocks for which to estimate haplotypes
within each `block’, each locus is separated by colons (‘:’)
- estLinkageDisequilibrium(locusKeys=None, permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None)#
Estimate linkage disequilibrium (LD) for listed groups in ‘locusKeys’.
Format of ‘locusKeys’ is a string consisting of:
comma (‘,’) separated haplotypes blocks for which to estimate haplotypes
within each `block’, each locus is separated by colons (‘:’)
- allPairwise(permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None, haploSuppressFlag=None, haplosToShow=None, mode=None)#
Run pairwise statistics.
Estimate pairwise statistics for a given set of loci. Depending on the flags passed, can be used to estimate both LD (linkage disequilibrium) and HF (haplotype frequencies), an optional permutation test on LD can be run
- class Haplostats(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#
Bases:
Haplo
Haplotype and LD estimation implemented via haplo-stats.
This is a wrapper to a portion of the ‘haplo.stats’ R package
- serializeStart()#
Serialize start of XML output to XML stream
- serializeEnd()#
Serialize end of XML output to XML stream
- estHaplotypes(locusKeys=None, weight=None, control=None, numInitCond=10, testMode=False)#
Estimate haplotypes for the submatrix given in locusKeys, if locusKeys is None, assume entire matrix
LD is estimated if there are locusKeys consists of only two loci
FIXME: this does not yet remove missing data before haplotype estimations
- allPairwise(weight=None, control=None, numInitCond=10)#
Estimate pairwise statistics for all pairs of loci.