PyPop.Haplo#

Module for estimating haplotypes.

Classes#

Haplo

Abstract base class for haplotype parsing/output.

HaploArlequin

Haplotype estimation implemented via Arlequin

Emhaplofreq

Haplotype and LD estimation implemented via emhaplofreq.

Haplostats

Haplotype and LD estimation implemented via haplo-stats.

Module Contents#

class Haplo#

Abstract base class for haplotype parsing/output.

Currently a stub class (unimplemented).

class HaploArlequin(arpFilename, idCol, prefixCols, suffixCols, windowSize, mapOrder=None, untypedAllele='0', arlequinPrefix='arl_run', debug=0)#

Bases: Haplo

Inheritance diagram of PyPop.Haplo.HaploArlequin

Haplotype estimation implemented via Arlequin

Outputs Arlequin format data files and runtime info, also runs and parses the resulting Arlequin data so it can be made available programmatically to rest of Python framework.

Delegates all calls Arlequin to an internally instantiated ArlequinBatch Python object called ‘batch’.

Constructor for HaploArlequin object.

Expects:

  • arpFilename: Arlequin filename (must have ‘.arp’ file extension)

  • idCol: column in input file that contains the individual id.

  • prefixCols: number of columns to ignore before allele data starts

  • suffixCols: number of columns to ignore after allele data stops

  • windowSize: size of sliding window

  • mapOrder: list order of columns if different to column order in file (defaults to order in file)

  • untypedAllele: (defaults to ‘0’)

  • arlequinPrefix: prefix for all Arlequin run-time files

(defaults to ‘arl_run’).

  • debug: (defaults to 0)

outputArlequin(data)#

Outputs the specified .arp sample file.

runArlequin()#

Run the Arlequin haplotyping program.

Generates the expected ‘.txt’ set-up files for Arlequin, then forks a copy of ‘arlecore.exe’, which must be on ‘PATH’ to actually generate the haplotype estimates from the generated ‘.arp’ file.

genHaplotypes()#

Gets the haplotype estimates back from Arlequin.

Parses the Arlequin output to retrieve the haplotype estimated data. Returns a list of the sliding `windows’ which consists of tuples.

Each tuple consists of a:

  • dictionary entry (the haplotype-frequency) key-value pairs.

  • population name (original ‘.arp’ file prefix)

  • sample count (number of samples for that window)

  • ordered list of loci considered

class Emhaplofreq(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#

Bases: Haplo

Inheritance diagram of PyPop.Haplo.Emhaplofreq

Haplotype and LD estimation implemented via emhaplofreq.

This is essentially a wrapper to a Python extension built on top of the ‘emhaplofreq’ command-line program.

Will refuse to estimate haplotypes longer than that defined by ‘emhaplofreq’.

serializeStart()#

Serialize start of XML output to XML stream

serializeEnd()#

Serialize end of XML output to XML stream

estHaplotypes(locusKeys=None, numInitCond=None)#

Estimate haplotypes for listed groups in ‘locusKeys’.

Format of ‘locusKeys’ is a string consisting of:

  • comma (‘,’) separated haplotypes blocks for which to estimate haplotypes

  • within each `block’, each locus is separated by colons (‘:’)

e.g. ‘*DQA1:*DPB1,*DRB1:*DQB1’, means to est. haplotypes for

‘DQA1’ and ‘DPB1’ loci followed by est. of haplotypes for ‘DRB1’ and ‘DQB1’ loci.

estLinkageDisequilibrium(locusKeys=None, permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None)#

Estimate linkage disequilibrium (LD) for listed groups in ‘locusKeys’.

Format of ‘locusKeys’ is a string consisting of:

  • comma (‘,’) separated haplotypes blocks for which to estimate haplotypes

  • within each `block’, each locus is separated by colons (‘:’)

e.g. ‘*DQA1:*DPB1,*DRB1:*DQB1’, means to est. LD for

‘DQA1’ and ‘DPB1’ loci followed by est. of LD for ‘DRB1’ and ‘DQB1’ loci.

allPairwise(permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None, haploSuppressFlag=None, haplosToShow=None, mode=None)#

Run pairwise statistics.

Estimate pairwise statistics for a given set of loci. Depending on the flags passed, can be used to estimate both LD (linkage disequilibrium) and HF (haplotype frequencies), an optional permutation test on LD can be run

class Haplostats(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#

Bases: Haplo

Inheritance diagram of PyPop.Haplo.Haplostats

Haplotype and LD estimation implemented via haplo-stats.

This is a wrapper to a portion of the ‘haplo.stats’ R package

serializeStart()#

Serialize start of XML output to XML stream

serializeEnd()#

Serialize end of XML output to XML stream

estHaplotypes(locusKeys=None, weight=None, control=None, numInitCond=10, testMode=False)#

Estimate haplotypes for the submatrix given in locusKeys, if locusKeys is None, assume entire matrix

LD is estimated if there are locusKeys consists of only two loci

FIXME: this does not yet remove missing data before haplotype estimations

allPairwise(weight=None, control=None, numInitCond=10)#

Estimate pairwise statistics for all pairs of loci.