Table of Contents

1. Introduction
2. How to use this guide
3. Recent changes to PyPop
4. Authors of software components

1. Introduction

PyPop (Python for Population Genomics) is an environment developed at UC Berkeley for doing large-scale population genetic analyses including:

  • conformity to Hardy-Weinberg expectations

  • tests for balancing or directional selection

  • estimates of haplotype frequencies and measures and tests of significance for linkage disequilibrium (LD).

PyPop is an object-oriented framework implemented in the programming language Python. Python is a flexible scripting language which allows rapid prototyping of code and has powerful features for interfacing with other languages, such as C (in which we have already implemented many routines and which is particularly suited to computationally intensive tasks).

The output of the analyses are stored in the XML format (XML is the eXtensible Markup Language devised by the World Wide Web Consortium, and is a platform-independent, open standard for storing data). These output files can then be transformed using standard tools into many other data formats suitable for machine input (such as PHYLIP or input for spreadsheet programs such as Excel or statistical packages, such as R), plain text, or HTML for human-readable format. Storing the output in XML allows the final viewable output format to be redesigned at will, without requiring the (often time-consuming) re-running of the analyses themselves.

PyPop was originally developed for the analysis of data for the 13th International Histocompatiblity Workshop and Conference held in Seattle, Washington in 2002 (Meyer et al. 2007, Single et al. 2007a, Single et al. 2007a). For more details on the design and technical details of PyPop, please consult Lancaster et al. (2003), Lancaster et al. (2007a) and Lancaster et al. (2007b).