PyPop: Python for Population Genomics¶
PyPop (Python for Population Genomics) is an environment for doing large-scale population genetic analyses including:
conformity to Hardy-Weinberg expectations
tests for balancing or directional selection
estimates of haplotype frequencies and measures and tests of significance for linkage disequilibrium (LD).
PyPop News
2024-11-18: PyPop 1.1.2 released and available on PyPI.
Add
--citation
command-line option to print citation information for the installed release.numpy
dependency update to 2.1.3
2024-09-10: PyPop 1.1.1 released, enables support for Python 3.13 and build Python 3.13 wheels.
2024-05-30: PyPop 1.1.0 released. Increases the minimum macOS requirements to Catalina (Intel) and Big Sur (Silicon) to ensure binary compatibility with the GNU Scientific Library (gsl). Thanks to Steve Mack for testing.
2024-04-01: PyPop paper published in Frontiers in Immunology, see citing PyPop for details.
2024-03-08: PyPop paper, provisionally accepted.
2024-02-24: PyPop 1.0.2 released. Code scanning updates and updated
numpy
to 1.26.42024-02-11: PyPop 1.0.1 released. Added support for
ARM64
for Windows and Linux, and alsomuslinux
wheels. Improved support for scientific notation.2024-02-01: Preprint describing 1.0.0 released on Zenodo.
2023-11-07: PyPop 1.0.0 released
More details, including recent previous releases:
Highlights of PyPop 1.0.0 include:
PyPop now fully ported to Python 3.
New asymmetric linkage disequilibrium (ALD) computations (Thomson and Single, 2014):.
Improved tab-separated values (TSV) output file handling.
Preliminary support for Genotype List (GL) String.
Unit tests, new documentation system, continuous integration framework and PyPI package
and even more minor features and bug fixes… (see NEWS.md).
2023-11-04: release candidate 2 (1.0.0rc2) released. Fixes some missing TSV output.
2023-11-01: release candidate 1 (1.0.0rc1) released.
2023-10-27: seventh beta pre-release 1.0.0b7, Previous
arm64
issues have been resolved. Thanks to Owen Solberg for extensive testing and debugging.2023-10-13: fourth beta pre-release 1.0.0b4, . Although this release contains packages that will install on
arm64
/M1 machines, thesearm64
packages should be considered as alpha-only and are strictly for testing only. Please do not use PyPop on M1 machines for any production analyses, until we fix some underlyingarm64
numerical issues.2023-10-10: second beta pre-release 1.0.0b2
2023-09-26: first beta pre-release 1.0.0b1
2023: ported to Python 3, pre-release alpha versions of 1.0.0 under development - no formal release yet.
2022: 0.7.0 binaries deprecated.
2020: pypop is no longer a Fedora package (to be replaced by PyPI package)
2017: all new development is now in GitHub
See the PyPop Release History in the Python User Guide for even earlier history and full release notes.
PyPop is an object-oriented framework implemented in Python, but also contains C extensions for some computationally intensive tasks. Output of analyses are stored in XML format for maximum downstream flexibility. PyPop also has an internal facility for additionally aggregating the output XML and generating output tab-separated (TSV) files, as well as well as generating a default plain text file summary for each population.
Although it can be run on any kind of genotype data, it has additional support for analyzing population genotype with allelic nomenclature from the human leukocyte antigen (HLA) region.
An outline of PyPop can be found in our 2024 paper, and two previous papers.
Installation and documentation
Documentation, including instructions on installing, using and interpreting output of PyPop, is contained in the PyPop User Guide.
Contact and questions
Please file all questions, support requests, and bug reports via our GitHub issue tracker. More details on how to file bug reports can be found in our contributors chapter of the User Guide. Please don’t email developers individually.
Source code
PyPop is free software (sometimes referred to as open source software) and the source code is released under the terms of the “copyleft” GNU General Public License, or GPL (https://www.gnu.org/licenses/gpl.html) (specifically GPLv2, but any later version applies). All source code is available and maintained on our GitHub website.
How to cite PyPop
If you write a paper that uses PyPop in your analysis, please cite both:
our 2024 article in Frontiers in Immunology:
Lancaster AK, Single RM, Mack SJ, Sochat V, Mariani MP, Webster GD. (2024) “PyPop: A mature open-source software pipeline for population genomics.” Front. Immunol. 15:1378512 doi: 10.3389/fimmu.2024.1378512
and a citation to the Zenodo record which includes a DOI for the version of the software you used in your analyses. Citing this record and DOI supports reproducibility by allowing researchers to to determine the exact version of PyPop used in any particular analysis. In addition, it allows retrieval of long-term software source-code archives, independent of the original developers.
Here’s how to cite the correct version:
If you have PyPop version 1.1.2 or later, currently installed, you can run:
pypop --citation
which outputs the Zenodo record citation in the simple “APA” format (you can also choose from BibTeX, EndNote, RIS and other formats, see the section on command-line interfaces in the User Guide for more details).
If you do not have PyPop installed, have a release of PyPop earlier than 1.1.2, or otherwise want to obtain the DOI and citation for specific versions, follow these steps:
First visit the DOI for the overall Zenodo record: 10.5281/zenodo.10080667. This DOI represents all versions, and will always resolve to the latest one.
When you are viewing the record, look for the Versions box in the right-sidebar. Here are listed all versions (including older versions).
Select and click the version-specific DOI that matches the specific version of PyPop that you used for your analysis.
Once you are visiting the Zenodo record for the specific version, under the Citation box in the right-sidebar, select the citation format you wish to use and click to copy the citation. It will contain link to the version-specific DOI, and be of the form:
Lancaster, AK et al. (YYYY) “PyPop: Python for Population Genomics” (Version X.Y.Z) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.XXXXX
Note that citation metadata for the current Zenodo record is also stored in CITATION.cff
Two previous papers are also available (but not necessary to cite):
Lancaster AK, Single RM, Solberg OD, Nelson MP, Thomson G (2007) “PyPop update - a software pipeline for large-scale multilocus population genomics” Tissue Antigens 69 (s1), 192-197. [journal page, preprint PDF (112 kB)].
Lancaster A, Nelson MP, Single RM, Meyer D, Thomson G (2003) “PyPop: a software framework for population genomics: analyzing large-scale multi-locus genotype data”, in Pacific Symposium on Biocomputing vol. 8:514-525 (edited by R B Altman. et al., World Scientific, Singapore, 2003) [PubMed Central, PDF (344 kB)].
PyPop was originally developed for the analysis of data for the 13th International Histocompatiblity Workshop and Conference held in Seattle, Washington in 2002 (Meyer et al., 2007, Single et al., 2007, Single et al., 2007). For more details on the design and technical details of PyPop, please consult Lancaster et al. (2003), Lancaster et al. (2007), and Lancaster et al. (2007).
Acknowlegements
This work has benefited from the support of NIH grant AI49213 (13th IHW) and NIH/NIAID Contract number HHSN266200400076C N01-AI-40076. Thanks to Steven J. Mack, Kristie A. Mather, Steve G.E. Marsh, Mark Grote and Leslie Louie for helpful comments and testing.
Supplementary data files
Population data files and online supporting materials for published studies listed in the Solberg et al. (2008) meta-analysis paper.
ImmPort.org
PyPop is affiliated with https://ImmPort.org, the Immunology Database and Analysis Portal. The ImmPort system provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT. The development of the ImmPort system was supported by the NIH/NIAID Bioinformatics Integration Support Contract (BISC), Phase II.