Studying DNA methylation in non-model organisms
The rise of bisulfite sequencing has made it possible to study genome-wide DNA methylation patterns in almost any organism, but it comes with its fair share of methodological and analytical challenges. What do ecologists and evolutionary biologists need to consider when applying this approach to non-model systems?
DNA methylation—an epigenetic modification that (most often) consists of a methyl group added to a cytosine base—is an important gene regulatory mechanism with a key role in cell fate, disease states, and responses to the environment. Because of its role in mediating connections between DNA sequence and phenotype, understanding the predictors and consequences of variation in DNA methylation is of broad interest in ecology and evolutionary biology. Recent methodological advancements and the falling cost of high-throughput sequencing have made base-pair resolution DNA methylation assays possible for virtually any organism. However, these advancements come with new methodological and statistical considerations, some of which are particularly relevant to non-model organisms.
Our review attempts to summarize, synthesize, and provide potential solutions for these new challenges. We based our paper, as much as possible, on patterns observable in real data sets, including ones generated in our own hands. Indeed, many of our recommendations build directly on the lessons we’ve learned (and the mistakes we’ve made) in the course of our own work. In that sense, the “story behind the paper” begins five years ago, when we first threw our hats in the ecological epigenetics ring.
The Tung lab (including paper lead author Amanda Lea and co-author Tauras Vilgalys) is interested in the interplay between genes, ecology, and behavior. We investigate how social behavior influences gene regulation and how ecological and social factors interact to influence fitness-related traits. One of our primary study systems is a population of wild baboons in the Amboseli ecosystem of Kenya, where over 1800 individually recognized animals have been studied for more than 46 years. With our collaborators at the Amboseli Baboon Research Project, we have shown that early life experiences have long-term predictive effects on fertility and survival1—in fact, females who experience a lot of early adversity live, on average, 10 year shorter lives!
The molecular mechanisms responsible for these effects largely remain a mystery. During her PhD work at Duke, Amanda hypothesized that stable epigenetic marks like DNA methylation could play a role, and started working with bisulfite sequencing data to test this idea. Our first population data set looked at differences in resource base2 (wild-feeding baboons versus baboons who also had access to tourist food scraps).
It took us a relatively short time to generate the data—and a lot longer to learn how to quality control and analyze it. We had to figure out how to deal with batch effects unique to this data type, regional differences in DNA methylation patterns across the genome, and cell type heterogeneity (we now rush our field-collected blood samples to a lab in Nairobi for flow cytometry analysis, but we didn’t always do that). We also took a detour into statistical methods development, in collaboration with a talented biostatistician at University of Michigan, Xiang Zhou. Together, we developed an approach for modeling bisulfite sequencing data (MACAU3) that also controls for kinship/population structure—a pervasive problem in natural populations. As the lab expanded its work to other systems—not only other primates, but also more farflung organisms like spadefoot toads (in collaboration with David Pfennig and co-author Paul Durst at UNC) —we also were faced with taxonomic differences in DNA methylation patterns that have direct effects on data analysis.
These are the types of issues that we try to highlight in our review. Based on conversations about our own work and papers we read at lab meeting, we realized that, while there are a number of good reviews about the biological insights that have been or could be produced in ecological/evolutionary epigenetics, there wasn’t much out there about the data design and analysis issues that researchers are likely to encounter. We also encountered ecological epigenetics papers that used statistical approaches that, in other corners of epigenetic research (e.g., medical or human population epigenetics), are known to generate high false positive rates, or study designs that we thought were too power limited to support their conclusions
To provide support for our recommendations, we have drawn on a combination of literature review, simulations, and data reanalysis that we hope provides a sense of the characteristics of bisulfite sequencing data sets in ecological and evolutionary studies. We try to answer questions that we’ve asked ourselves too: what exactly are the effect sizes I might expect from my study? Should I invest in more sequencing or more samples (spoiler: usually the latter!)? Should I analyze each CpG site separately, or aim for more global or region-specific analyses? We also provide potential solutions to common problems, including a downloadable R Shiny app so users can perform their own versions of the power analyses that accompany our paper.
The study of DNA methylation in the context of ecology and evolution is still in its infancy, and every new field comes with its own set of difficulties that need to be overcome. We hope that this paper will provide researchers with the tools to conduct thoughtful analyses that increase our understanding of how DNA methylation helps shape the world around us .
Our paper in Nature Ecology & Evolution can be found here: http://go.nature.com/2uPWRKH
For resources related to ecological/evolutionary epigenetics and bisulfite sequencing data analysis, please see http://tung-lab.org/protocols-and-software.html. We’ve posted protocols for reduced representation bisulfite sequencing and a new high-throughput assay to functionally validate the effects of DNA methylation on gene expression4, the R Shiny app for bisulfite sequencing power analysis, and a link to MACAU (freely available software for analyzing bisulfite sequencing data).
1. Tung, J., Archie, E.A., Altmann, J., and Alberts, S.C. 2016. Cumulative early adversity predicts longevity in wild baboons. Nature Communications 7: 11181.
2. Lea, A.J., Altmann, J., Alberts, S.C., and Tung, J. 2016. Resource base influences genome-wide DNA methylation levels in wild baboons (Papio cynocephalus). Molecular Ecology 25: 1681-1696.
3. Lea, A.J., Tung, J., and Zhou, X. 2015. A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data. PLoS Genetics 11: e1005650.
4. Lea, A.J., Vockley, C.M., Johnston, R.A., Del Carpio, C.A., Barreiro, L.B., Reddy, T.E., and Tung, J. Genome-wide quantification of the effects of DNA methylation on human gene regulation. bioRxiv: dx.doi.org/10.1101/146829.