Kapun, Martin; Barron, Maite; Staubach, Fabian; Vieira, Jorge; Obbard, Darren; Wiberg, Axel R W; Goubert, Clement; Stabelli, Omar Rota; Kankare, Maaria; Haudry, Annabelle; Bogaerts-Marquez, Maria; Waidele, Lena; Kozeretska, Iryna; Pasyukova, Elena; Loeschcke, Volker; Pascual, Marta; Vieira, Cristina P; Serga, Svitlana; Montchamp-Moreau, Catherine; Abbott, Jessica; Gibert, Patricia; Porcelli, Damiano; Posnien, Nico; Sanchez-Gracia, Alejandro; Grath, Sonja; Sucena, Elio; Bergland, Alan; Guerreiro, Maria Pilar Garcia; Onder, Banu Sebnem; Argyridou, Eliza; Guio, Lain; Schou, Mads Fristrup; Deplancke, Bart; Vieira, Cristina; Ritchie, Michael G; Zwaan, Bas; Tauber, Eran; Orengo, Dorcas; Puerma, Eva; Aguade, Montserrat; Schmidt, Paul; Parsch, John; Betancourt, Andrea; Flatt, Thomas; Gonzalez, Josefa
ABSTRACT
Drosophila melanogaster is a premier model in population genetics and genomics, and a growing number of whole-genome datasets from natural populations of this species have been published over the last 20 years. A major challenge is the integration of these disparate datasets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution and population structure of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 population samples from over 100 locations in >20 countries on four continents. Several of these locations are sampled at different seasons across multiple years. This dataset, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental meta-data. A web-based genome browser and web portal provide easy access to the SNP dataset. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan dataset. Our resource will enable population geneticists to analyze spatio-temporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.