The paper in Nature Ecology & Evolution is here: http://go.nature.com/2DKCIdf
Everybody knows the iconic plains zebra, the famous denizen of the African savanna. Yet, researchers have not been able to agree on how the within-species variation is structured. People who favor things like measuring craniums and counting stripes suggest there are discrete morphological types (called subspecies), whereas early genetic work suggests there is just gradual variation and no discrete gaps between populations in different parts of Africa. And what about the mysterious stripeless quagga (which went extinct more than a hundred years ago): is it just an unusual color morph of the plains zebra? If so, which populations of plains zebra are most closely related to it? These questions are not only academic, they are important because the within-species structure is used in planning the species conservation, e.g. to identify whether there are regional populations that have an increased risk of becoming isolated and suffering detrimental genetic effects. Furthermore, the lack of resolvable structure has frustrated any attempts to infer the phylogeography of the species (essentially answering where and when plains zebra populations have migrated to fill out their current geographical distribution). We decided that these questions could probably be answered by getting much more comprehensive genome-wide data than what had been analyzed from plains zebras up until now.
Unlike most people, we have the luxury of having an in-house collection of frozen plains zebra tissue. After a few months of trying to solve a difficult optimization problem of maximizing the geographical coverage, having a decent balance of subspecies representation, getting sufficient quality DNA out of samples and at the same time not breaking the bank in terms of the total project cost, I had driven my lab tech crazy, but we got RADseq results from 65 zebra samples, including three mountain zebras and three Grevy’s zebras. To this we could add the complete genome of a quagga which was published in 2014.
The data was meticulously analyzed using a range of bioinformatics and population genetic methods. Memorable highlights from this stage included the use of something appropriately called ANGSD (Analysis of Next Generation Sequencing Data), which is a great and clever new framework for performing genetic analyses without calling genotypes. This is one of the most exciting and versatile tools for NGS data, especially for low to medium depth sequencing data. It is also one of the most difficult tools to use due to the lack of documentation and a constant evolution of the methods and options contained in the package (I can say this because the developers are good friends of mine and I have often griped to them about this, which they seem to take in good humor).
Our analyses showed that neither of the two perspectives on plains zebra are exact. We found discrete geographic structure in the plains zebra genome-wide variation, in contrast to previous genetic studies using simpler genetic markers. We identified up to nine such extant groups, which we designate as populations rather than subspecies because the genetic differences between them are mostly subtle, and because they do not seem to constitute discrete morphological types. We attribute our ability to detect such subtle structure to our much-improved genetic resolution stemming from the vastly increased number of markers we looked at. To put this into perspective, we looked at up to 167,000 variable positions in the genome whereas previous studies included only a single(!) or up to seven such markers. Surprisingly, we found that the currently most favored subspecies classification scheme did not match the inferred populations very accurately. Some of the subspecies were sub-partitioned into several populations, which were not always closely related in the population tree. In other cases, different parts of a population would be placed in different nominal subspecies. Using this revised population scheme we identify a common origin near the present-day Zambian and Botswanan population around 340,000 years ago, which we speculate could be due to the permanent presence of wetlands in this area (the Zambezi river and the Makgadikgadi paleolake). Plains zebra are relatively water-dependent and would prefer well-watered areas in times of aridification, of which there has been several pulses in Africa during the Pleistocene. So we suggest the likely presence of a plains zebra (and other savanna ungulates?) refugium in this system of wetlands. We were also able to track the routes by which plains zebras migrated northeast-wards and southwest-wards from this refugium.
Strikingly, we found strong evidence that the northeast-wards migration followed three distinct routes, two following the two branches of the Great Rift Valley through Tanzania, and a third one going south of Lake Malawi and entering southeastern Tanzania from Mozambique. We identify several populations which have reduced genetic diversity relative to the rest, two of which are not represented as individual entities in the subspecies scheme.
Regarding the quagga, we confirmed earlier genetic conclusions that it belongs within the plains zebra variation. We also show that it is probably just the southern endpoint of the expansion out of the Zambezian refugium.
The take-home message from this is that there is probably a lot of global biodiversity hidden in the species genomes that is not apparent from looking at morphological features, sometimes referred to as ‘cryptic’ variation. Furthermore, upscaling from simple genetic markers can really impact the way we look at species variation and can open the door to much more detailed insights into the natural history of species.
Link to the NE&E paper by Pedersen et al.