I have a strong interest in building high quality open data resources to power large-scale collaborative science. Below are genomic data resources which I’ve played a significant part in creating.

  • The Malaria Vector Genome Observatory - This resource includes data from whole-genome sequencing of more than 24,000 individual Anopheles mosquitoes collected from 31 African countries. As far as I’m aware, it is currently the largest open dataset of natural genetic variation for any eukaryotic species other than humans. The Vector Observatory brings high quality genome variation data together with software tools and training resources to empower African scientists to lead research on mosquito ecology and evolution.

  • The Anopheles gambiae 1000 Genomes Project phase 3 SNP data resource - Includes sample metadata, sequence read alignments and genome-wide single nucleotide polymorphism (SNP) calls from whole-genome sequencing of 2,784 wild-caught mosquitoes collected from 19 countries in sub-Saharan Africa, and 297 mosquitoes comprising parents and progeny of 15 lab crosses. Three mosquito species are represented: Anopheles gambiae, Anopheles coluzzii and Anopheles arabiensis.

  • The Anopheles gambiae 1000 Genomes Project phase 2 data resource - Includes sample metadata, sequence read alignments, genome-wide single nucleotide polymorphism (SNP) calls, SNP haplotypes and copy number variation (CNV) calls from whole-genome sequencing of 1,142 wild-caught mosquito specimens collected from 13 countries spanning sub-Saharan Africa, and 234 specimens comprising parents and progeny of 11 lab crosses. Two mosquito species are represented: Anopheles gambiae and Anopheles coluzzii.

  • The Anopheles gambiae 1000 Genomes Project phase 1 SNP and haplotype data resources, including variant calls and associated data from whole-genome sequencing of 845 mosquito specimens — 765 wild-caught specimens collected from eight countries across sub-Saharan Africa, and 80 specimens comprising parents and progeny of four crosses. Two mosquito species are represented: Anopheles gambiae and Anopheles coluzzii.

  • The MalariaGEN Plasmodium falciparum Genetic Crosses data resource - Comprises sequence alignments, SNP calls, small indel calls and copy number variant calls from whole-genome sequencing of parents and 78 progeny clones from the crosses 3D7xHB3, HB3xDd2 and 7G8xGB4.