Putting it all together: creating a more user-friendly database for (nearly) all your mammal macroecological needs

Our paper in Ecology can be found here: https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/ecy.2443

Aug 22, 2018

Over the last few years while investigating various macroscale patterns like the island rule, the cause of the megafauna extinctions and the magnitude of anthropogenic modifications of diversity patterns, we have collected and generated large amounts of data on mammal biogeography, functional traits, and evolutionary relationships. Collecting, cleaning, and assembling all this data drove home two major obstacles facing all macroecological studies, including ours. First, even the most comprehensive datasets lack large, nonrandom chunks of information. Second, it is difficult to compare different datasets because they all use conflicting taxonomies. Carrying out an analysis as straightforward as a regression of average body mass on latitude (while controlling for phylogenetic nonindependence of traits) required a researcher to combine at least three different global mammal datasets from three different labs, all with conflicting taxonomies and numbers of species, a frustrating, non-trivial and annoyingly time-consuming task. We realized that even our own data, which were freely available in various journal appendices, required significant effort (by us, the people who made and understand them!) to synthesize and prepare into an analysis-ready format. Rather than expending time assembling and curating data only for our own uses, we decided that the wider scientific community would benefit greatly from a comprehensive, internally consistent database of mammals. Thus, we started construction of a global database of all 5,831 mammal species that lived during the last 130,000 years.

The first important task (although potentially not the most important) was to decide what the database should be called. We initially thought about a name from Norse mythology as the database represents a collaboration between people from Nordic universities (#Branding). Since the database should be easy to find we wanted something relatively well known but not a name that had already been used for other products. We thought we had a good option with Gleipnir, the chain that held Fenris the wolf because our database was intended to link together multiple underlying datasets, a task almost as tricky as keeping the legendary Fenris at bay. We however realized two crucial objections to using Gleipnir. First, it was practically unpronounceable to non-viking tongues and second, Gleipnir ultimately failed, allowing the evil Fenris wolf to escape and bring about Ragnarok, the end of the world (and we would prefer any potential future Ragnarok to be unrelated to our database). Although not from Norse legend, we ended up choosing PHYLACINE, which evokes the large phylogeny in our database and all the species we included from anthropogenic extinctions. One of the most well-known species driven extinct by humans was the similarly named thylacine (also known as the Tasmanian tiger, though our database highlights that it naturally occurred across all of Australia, and much of New Guinea). Many of these species used to be vitally important and we included them to better understand how natural ecosystems functioned. 

After the important naming issues were settled we started the real work. We combined an updated version of our global mammal phylogeny with various datasets on ranges, body size, and diet. We carefully matched up the various taxonomies used into one internally consistent taxonomy based on the IUCN Red List and provided synonymy tables so that PHYLACINE data can be quickly matched up with other external datasets and previous data appendices we have published. We spent considerable time tracking down missing trait values and filling in holes in the database, covering diet for 92%, and body mass for 97% of all mammals, respectively. We phylogenetically imputed the few missing values that remained. The resulting database  should be useable straight out of the box and because no ad-hoc data filling steps are necessary, results can be immediately compared to other analyses using PHYLACINE. Phylogenies and range maps can be easily and seamlessly combined with extinction risks and diets through only a few lines of code. Science is hard enough when we only have to worry about biological reasoning and data analysis. We hope that our “pre-cleaned” database will enable researchers to use their time and energy where it is most needed, – addressing cool biological questions, rather than cleaning up spelling variants scrubbing PDF scans for mass values and other mundane tasks.

A vignette we provide online at our GitHub page (https://github.com/MegaPast2Future/PHYLACINE_1.2#vignette) illustrates the kinds of analyses that can be done using data solely from PHYLACINE (as well as ones analysis program of choice like in this case R).

One unique feature of PHYLACINE is that it includes all mammals that lived since the last interglacial (~130,000 years ago), not just extant species. We included these extinct species because most of the hypotheses we test as macroecologists or macroevolutionary biologists make predictions about “natural” patterns of biodiversity, not the heavily anthropogenically modified patterns we actually see today. To make such hypotheses easier to test, our database contains extinct species and the “present natural” range for all species, i.e. the estimated range as it could have been without any human involvement. This includes present-natural ranges for species that went extinct too, like the thylacine. However, PHYLACINE also contains the current range for all species and extinct species are clearly marked so while we caution against it, users of the database can easily restrict their analyses to extant species and current ranges if they want to.

The above map shows the current range of the brown bear (Ursus arctos) in blue and its present natural range in red. The present natural range represents where the brown bear could live today without human involvement. Although we picked the brown bear as a well-known example of massive range decline, large differences between current and present natural ranges are the norm for most large mammalian species. 

By hosting the development version of PHYLACINE freely and openly on GitHub (MegaPast2Future.GitHub.io), we hope our database can grow into a widely used community resource in macroecology. During the assembly of the database, we strove to reduce errors and inconsistencies but we encourage other researchers to investigate the data thoroughly and contact us with any corrections or suggestions they may have. We have an error reporting form as well as a  Most Wanted List of missing trait values. Ongoing updates of the database are already planned that will include newly available information on taxonomy, range size, trait values, and phylogeny. 

PHYLACINE is the result of a collaboration between scientists from Gothenburg University and Aarhus University and would not have been possible without the hard work of co-authors Matt Davis, Rasmus Ø. Pedersen, Simon D. Schowanek, Alexandre Antonelli and Jens-Christian Svenning and funding from multiple sources including the Wallenberg Foundation the European Research Council and the Carlsberg Foundation.

Søren Faurby.

Søren Faurby

Ass. Professor , Göteborgs universitet

No comments yet.