PyPop: Python for Population Genomics¶
PyPop (Python for Population Genomics) is an environment for doing large-scale population genetic analyses including:
conformity to Hardy-Weinberg expectations
tests for balancing or directional selection
estimates of haplotype frequencies and measures and tests of significance for linkage disequilibrium (LD).
PyPop News
2023-11-07: PyPop 1.0.0 released and available on PyPI, highlights include:
PyPop now fully ported to Python 3.
New asymmetric linkage disequilibrium (ALD) computations ([Thomson:Single:2014]):.
Improved tab-separated values (TSV) output file handling.
Preliminary support for Genotype List (GL) String.
Unit tests, new documentation system, continuous integration framework and PyPI package
and even more minor features and bug fixes… (see NEWS.rst).
Recent previous releases:
2023-11-04: release candidate 2 (1.0.0rc2) released. Fixes some missing TSV output.
2023-11-01: release candidate 1 (1.0.0rc1) released.
2023-10-27: seventh beta pre-release 1.0.0b7, Previous
arm64
issues have been resolved. Thanks to Owen Solberg for extensive testing and debugging.2023-10-13: fourth beta pre-release 1.0.0b4, . Although this release contains packages that will install on
arm64
/M1 machines, thesearm64
packages should be considered as alpha-only and are strictly for testing only. Please do not use PyPop on M1 machines for any production analyses, until we fix some underlyingarm64
numerical issues.2023-10-10: second beta pre-release 1.0.0b2
2023-09-26: first beta pre-release 1.0.0b1
2023: ported to Python 3, pre-release alpha versions of 1.0.0 under development - no formal release yet.
2022: 0.7.0 binaries deprecated.
2020: pypop is no longer a Fedora package (to be replaced by PyPI package)
2017: all new development is now in GitHub
See the PyPop Release History in the Python User Guide for even earlier history and full release notes.
PyPop is an object-oriented framework implemented in Python, but also contains C extensions for some computationally intensive tasks. Output of analyses are stored in XML format for maximum downstream flexibility. PyPop also has an internal facility for additionally aggregating the output XML and generating output tab-separated (TSV) files, as well as well as generating a default plain text file summary for each population.
Although it can be run on any kind of genotype data, it has additional support for analyzing population genotype with allelic nomenclature from the human leukocyte antigen (HLA) region.
An outline of PyPop can be found in our 2007 Tissue Antigens and 2003 PSB papers.
Installation and documentation
Documentation, including instructions on installing, using and interpreting output of PyPop, is contained in the PyPop User Guide.
Contact and questions
Please file all questions, support requests, and bug reports via our GitHub issue tracker. More details on how to file bug reports can be found in our contributors chapter of the User Guide. Please don’t email developers individually.
Source code
PyPop is free software (sometimes referred to as open source software) and the source code is released under the terms of the “copyleft” GNU General Public License, or GPL (http://www.gnu.org/licenses/gpl.html). All source code is available and maintained on our GitHub website.
How to cite PyPop
If you write a paper that uses PyPop in your analysis, please cite both:
our 2007 paper from Tissue Antigens:
Lancaster AK, Single RM, Solberg OD, Nelson MP and G Thomson (2007) “PyPop update - a software pipeline for large-scale multilocus population genomics” Tissue Antigens 69 (s1), 192-197. [journal page, preprint PDF (112 kB)].
and the Zenodo record for the software. Citation metadata for the current Zenodo record is stored in CITATION.cff It will be of the form and contain link to the DOI:
Lancaster, AK et al. (2023) “PyPop: Python for Population Genomics” (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.XXXXX
When you follow the DOI link to the Zenodo record, you will see all the versions listed on the right, make sure you choose the version-specific DOI that matches the specific version of PyPop in your citation.
Also available (but not necessary to cite), is our 2003 Pacific Symposium on Biocomputing paper:
Alex Lancaster, Mark P. Nelson, Richard M. Single, Diogo Meyer, and Glenys Thomson (2003) “PyPop: a software framework for population genomics: analyzing large-scale multi-locus genotype data”, in Pacific Symposium on Biocomputing vol. 8:514-525 (edited by R B Altman. et al., World Scientific, Singapore, 2003) [PubMed Central, PDF (344 kB)].
PyPop was originally developed for the analysis of data for the 13th International Histocompatiblity Workshop and Conference held in Seattle, Washington in 2002 ([Meyer:etal:2007], [Single:etal:2007a], [Single:etal:2007a]). For more details on the design and technical details of PyPop, please consult [Lancaster:etal:2003], [Lancaster:etal:2007a] and [Lancaster:etal:2007b].
Acknowlegements
This work has benefited from the support of NIH grant AI49213 (13th IHW) and NIH/NIAID Contract number HHSN266200400076C N01-AI-40076. Thanks to Steven J. Mack, Kristie A. Mather, Steve G.E. Marsh, Mark Grote and Leslie Louie for helpful comments and testing.
Supplementary data files
Population data files and online supporting materials for published studies listed in the [Solberg:etal:2008] meta-analysis paper.
ImmPort.org
PyPop is affiliated with https://ImmPort.org, the Immunology Database and Analysis Portal. The ImmPort system provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT. The development of the ImmPort system was supported by the NIH/NIAID Bioinformatics Integration Support Contract (BISC), Phase II.