PyPop: Python for Population Genomics¶

PyPop (Python for Population Genomics) is an environment for doing large-scale population genetic analyses including:

conformity to Hardy-Weinberg expectations
tests for balancing or directional selection
estimates of haplotype frequencies and measures and tests of significance for linkage disequilibrium (LD).

PyPop News

2025-07-28: PyPop 1.2.2 is released. This release includes dependency updates and nox task framework useful for contributors and developers. Experimental support for wheels built for Windows ARM64 is also available in the Test PyPI repo, to test, run:
```
pip install --extra-index-url https://test.pypi.org/simple/ pypop-genomics
```
Please test and report issues via the bug tracker.
2025-04-29: PyPop 1.2.1 is released. This release includes a bug fix in text output, and internal changes and updates to dependencies.
2024-04-01: PyPop paper published in Frontiers in Immunology, see citing PyPop for details.
More details, including recent previous releases:

2025-02-04: PyPop 1.2.0 is released: includes updates and restoration of full functionality for [Sequence] options as part of the [Filters], including dynamic downloads of HLA sequence data and major updates to documentation, using new HLA nomenclature throughout.

2025-01-16: Beta release PyPop 1.2.0b2 is released.

2025-01-05: Pre-release PyPop 1.1.3rc1 is now available. Experimental wheels for Windows ARM64 are added in this release.

2024-11-18: PyPop 1.1.2 released: adds --citation command-line option to print citation information and updates numpy to 2.1.3

2024-09-10: PyPop 1.1.1 released, enables support for Python 3.13 and build Python 3.13 wheels.

2024-05-30: PyPop 1.1.0 released. Increases the minimum macOS requirements to Catalina (Intel) and Big Sur (Silicon) to ensure binary compatibility with the GNU Scientific Library (gsl). Thanks to Steve Mack for testing.

2024-03-08: PyPop paper, provisionally accepted.

2024-02-24: PyPop 1.0.2 released. Code scanning updates and updated numpy to 1.26.4

2024-02-11: PyPop 1.0.1 released. Added support for ARM64 for Linux, and also muslinux wheels. Improved support for scientific notation.

2024-02-01: Preprint describing 1.0.0 released on Zenodo.

2023-11-07: PyPop 1.0.0 released

Highlights of PyPop 1.0.0 include:

PyPop now fully ported to Python 3.

New asymmetric linkage disequilibrium (ALD) computations (Thomson and Single, 2014):.

Improved tab-separated values (TSV) output file handling.

Preliminary support for Genotype List (GL) String.

Unit tests, new documentation system, continuous integration framework and PyPI package

and even more minor features and bug fixes… (see NEWS.md).

2023-11-04: release candidate 2 (1.0.0rc2) released. Fixes some missing TSV output.

2023-11-01: release candidate 1 (1.0.0rc1) released.

2023-10-27: seventh beta pre-release 1.0.0b7, Previous arm64 issues have been resolved. Thanks to Owen Solberg for extensive testing and debugging.

2023-10-13: fourth beta pre-release 1.0.0b4, . Although this release contains packages that will install on arm64/M1 machines, these arm64 packages should be considered as alpha-only and are strictly for testing only. Please do not use PyPop on M1 machines for any production analyses, until we fix some underlying arm64 numerical issues.

2023-10-10: second beta pre-release 1.0.0b2

2023-09-26: first beta pre-release 1.0.0b1

2023: ported to Python 3, pre-release alpha versions of 1.0.0 under development - no formal release yet.

2022: 0.7.0 binaries deprecated.

2020: pypop is no longer a Fedora package (to be replaced by PyPI package)

2017: all new development is now in GitHub

See the PyPop Release History in the Python User Guide for even earlier history and full release notes.

PyPop is an object-oriented framework implemented in Python, but also contains C extensions for some computationally intensive tasks. Output of analyses are stored in XML format for maximum downstream flexibility. PyPop also has an internal facility for additionally aggregating the output XML and generating output tab-separated (TSV) files, as well as well as generating a default plain text file summary for each population.

Although it can be run on any kind of genotype data, it has additional support for analyzing population genotype with allelic nomenclature from the human leukocyte antigen (HLA) region.

An outline of PyPop can be found in our 2024 paper (Lancaster et al., 2024), and two previous papers.

Installation and documentation¶

Documentation, including instructions on installing, using and interpreting output of PyPop, is contained in the PyPop User Guide.

Contact and questions

Please file all questions, support requests, and bug reports via our GitHub issue tracker. More details on how to file bug reports can be found in our contributors chapter of the User Guide. Please don’t email developers individually.

Source code¶

PyPop is free software (sometimes referred to as open source software) and the source code is released under the terms of the “copyleft” GNU General Public License, or GPL (https://www.gnu.org/licenses/gpl.html) (specifically GPLv2, but any later version applies). All source code is available and maintained on our GitHub website.

Citing PyPop¶

If you write a paper that uses PyPop in your analysis, please cite both:

our 2024 article in Frontiers in Immunology:

Lancaster AK, Single RM, Mack SJ, Sochat V, Mariani MP, Webster GD. (2024) “PyPop: A mature open-source software pipeline for population genomics.” Front. Immunol. 15:1378512 doi: 10.3389/fimmu.2024.1378512
and a citation to the Zenodo record which includes a DOI for the version of the software you used in your analyses. Citing this record and DOI supports reproducibility by allowing researchers to to determine the exact version of PyPop used in any particular analysis. In addition, it allows retrieval of long-term software source-code archives, independent of the original developers.

Here’s how to cite the correct version:
- If you have PyPop version 1.1.2 or later, currently installed, you can run:
```
pypop --citation
```
  which outputs the Zenodo record citation in the simple “APA” format (you can also choose from BibTeX, EndNote, RIS and other formats, see the section on command-line interfaces in the User Guide for more details).
- If you do not have PyPop installed, have a release of PyPop earlier than 1.1.2, or otherwise want to obtain the DOI and citation for specific versions, follow these steps:
  1. First visit the DOI for the overall Zenodo record: 10.5281/zenodo.10080667. This DOI represents all versions, and will always resolve to the latest one.
  2. When you are viewing the record, look for the Versions box in the right-sidebar. Here are listed all versions (including older versions).
  3. Select and click the version-specific DOI that matches the specific version of PyPop that you used for your analysis.
  4. Once you are visiting the Zenodo record for the specific version, under the Citation box in the right-sidebar, select the citation format you wish to use and click to copy the citation. It will contain link to the version-specific DOI, and be of the form:
    
    Lancaster, AK et al. (YYYY) “PyPop: Python for Population Genomics” (Version X.Y.Z) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.XXXXX
Note that citation metadata for the current Zenodo record is also stored in CITATION.cff

Two previous papers are also available (but not necessary to cite):

Lancaster AK, Single RM, Solberg OD, Nelson MP, Thomson G (2007) “PyPop update - a software pipeline for large-scale multilocus population genomics” Tissue Antigens 69 (s1), 192-197. [journal page, preprint PDF (112 kB)].
Lancaster A, Nelson MP, Single RM, Meyer D, Thomson G (2003) “PyPop: a software framework for population genomics: analyzing large-scale multi-locus genotype data”, in Pacific Symposium on Biocomputing vol. 8:514-525 (edited by R B Altman. et al., World Scientific, Singapore, 2003) [PubMed Central, PDF (344 kB)].

PyPop was originally developed for the analysis of data for the 13th International Histocompatiblity Workshop and Conference held in Seattle, Washington in 2002 (Meyer et al., 2007, Single et al. 2007a, 2007b). For more details on the design and technical details of PyPop, please consult Lancaster et al. (2003, 2007a, 2007b, 2024).

Acknowledgements¶

This work has benefited from the support of NIH grant AI49213 (13th IHW) and NIH/NIAID Contract number HHSN266200400076C N01-AI-40076. Thanks to Steven J. Mack, Kristie A. Mather, Steve G.E. Marsh, Mark Grote and Leslie Louie for helpful comments and testing.

Supplementary data files¶

Population data files and online supporting materials for published studies listed in the Solberg et al. (2008) meta-analysis paper.

ImmPort.org¶

PyPop is affiliated with https://ImmPort.org, the Immunology Database and Analysis Portal. The ImmPort system provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT. The development of the ImmPort system was supported by the NIH/NIAID Bioinformatics Integration Support Contract (BISC), Phase II.