PyPop: Python for Population Genomics

PyPop (Python for Population Genomics) is an environment for doing large-scale population genetic analyses including:

  • conformity to Hardy-Weinberg expectations

  • tests for balancing or directional selection

  • estimates of haplotype frequencies and measures and tests of significance for linkage disequilibrium (LD).

PyPop is an object-oriented framework implemented in Python, but also contains C extensions for some computationally intensive tasks. Output of analyses are stored in XML format for maximum downstream flexibility. PyPop also has an internal facility for additionally aggregating the output XML and generating output tab-separated (TSV) files, as well as well as generating a default plain text file summary for each population.

Although it can be run on any kind of genotype data, it has additional support for analyzing population genotype with allelic nomenclature from the human leukocyte antigen (HLA) region.

An outline of PyPop can be found in our 2024 paper, and two previous papers.

Installation and documentation

Documentation, including instructions on installing, using and interpreting output of PyPop, is contained in the PyPop User Guide.

Contact and questions

Please file all questions, support requests, and bug reports via our GitHub issue tracker. More details on how to file bug reports can be found in our contributors chapter of the User Guide. Please don’t email developers individually.

Source code

PyPop is free software (sometimes referred to as open source software) and the source code is released under the terms of the “copyleft” GNU General Public License, or GPL (https://www.gnu.org/licenses/gpl.html) (specifically GPLv2, but any later version applies). All source code is available and maintained on our GitHub website.

How to cite PyPop

If you write a paper that uses PyPop in your analysis, please cite both:

  • our 2024 article in Frontiers in Immunology:

    Lancaster AK, Single RM, Mack SJ, Sochat V, Mariani MP, Webster GD. (2024) “PyPop: A mature open-source software pipeline for population genomics.” Front. Immunol. 15:1378512 doi: 10.3389/fimmu.2024.1378512

  • and a citation to the Zenodo record which includes a DOI for the version of the software you used in your analyses. Citing this record and DOI supports reproducibility by allowing researchers to to determine the exact version of PyPop used in any particular analysis. In addition, it allows retrieval of long-term software source-code archives, independent of the original developers.

    Here’s how to cite the correct version:

    • If you have PyPop version 1.1.2 or later, currently installed, you can run:

      pypop --citation
      

      which outputs the Zenodo record citation in the simple “APA” format (you can also choose from BibTeX, EndNote, RIS and other formats, see the section on command-line interfaces in the User Guide for more details).

    • If you do not have PyPop installed, have a release of PyPop earlier than 1.1.2, or otherwise want to obtain the DOI and citation for specific versions, follow these steps:

      1. First visit the DOI for the overall Zenodo record: 10.5281/zenodo.10080667. This DOI represents all versions, and will always resolve to the latest one.

      2. When you are viewing the record, look for the Versions box in the right-sidebar. Here are listed all versions (including older versions).

      3. Select and click the version-specific DOI that matches the specific version of PyPop that you used for your analysis.

      4. Once you are visiting the Zenodo record for the specific version, under the Citation box in the right-sidebar, select the citation format you wish to use and click to copy the citation. It will contain link to the version-specific DOI, and be of the form:

        Lancaster, AK et al. (YYYY) “PyPop: Python for Population Genomics” (Version X.Y.Z) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.XXXXX

    Note that citation metadata for the current Zenodo record is also stored in CITATION.cff

Two previous papers are also available (but not necessary to cite):

  • Lancaster AK, Single RM, Solberg OD, Nelson MP, Thomson G (2007) “PyPop update - a software pipeline for large-scale multilocus population genomics” Tissue Antigens 69 (s1), 192-197. [journal page, preprint PDF (112 kB)].

  • Lancaster A, Nelson MP, Single RM, Meyer D, Thomson G (2003) “PyPop: a software framework for population genomics: analyzing large-scale multi-locus genotype data”, in Pacific Symposium on Biocomputing vol. 8:514-525 (edited by R B Altman. et al., World Scientific, Singapore, 2003) [PubMed Central, PDF (344 kB)].

PyPop was originally developed for the analysis of data for the 13th International Histocompatiblity Workshop and Conference held in Seattle, Washington in 2002 (Meyer et al., 2007, Single et al., 2007, Single et al., 2007). For more details on the design and technical details of PyPop, please consult Lancaster et al. (2003), Lancaster et al. (2007), and Lancaster et al. (2007).

Acknowledgements

This work has benefited from the support of NIH grant AI49213 (13th IHW) and NIH/NIAID Contract number HHSN266200400076C N01-AI-40076. Thanks to Steven J. Mack, Kristie A. Mather, Steve G.E. Marsh, Mark Grote and Leslie Louie for helpful comments and testing.

Supplementary data files

Population data files and online supporting materials for published studies listed in the Solberg et al. (2008) meta-analysis paper.

ImmPort.org

PyPop is affiliated with https://ImmPort.org, the Immunology Database and Analysis Portal. The ImmPort system provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT. The development of the ImmPort system was supported by the NIH/NIAID Bioinformatics Integration Support Contract (BISC), Phase II.