Arabidopsis thaliana has a long-standing tradition as a model organism in plant biology. The choice of this organism as a model species arises from its selfing nature, short life cycle, small stature, and compact genome. Over the years, a tremendous body of research has accumulated on the anatomy, morphology, physiology, biochemistry, and genetics of this species. In recent years, Arabidopsis is also becoming an organism of reference for the study of natural variation. The idea of our paper, recently published in Nature Ecology & Evolution, originated from the realization that research in Arabidopsis, our model organism of choice, is transitioning from the study of a common wild-type, usually Columbia-0 (Col-0), to the study of a myriad of ecotypes, or accessions as they are commonly known.
Arabidopsis presents an invaluable toolbox for the study of natural variation, including the fully-sequenced genome, methylome, transcriptome, predicted proteome, and phenome, of 1,135 naturally inbred Arabidopsis accessions1-3. These datasets has been amassed on accessions collected from known sites and with a well-studied population history2. These accessions are part of what is known as the 1001 Genomes Project. Given the selfing nature of Arabidopsis, seeds from these sequenced accessions can be obtained from stock centers, allowing researchers to grow individuals that are genetically identical to the ones initially sequenced.
Typically, we could take advantage of these resources by choosing a subset of accessions included in the 1001 Genomes Project and growing them together in the greenhouse or field. We can then characterize variation in a phenotype of interest for these accessions and identify the associated, and presumably causative, genetic variation (G×P). Many researchers have used this methodology, in what is known as a Genome-Wide Association Study (GWAS), and there are numerous examples of the successful use of this approach in Arabidopsis.
GWASs also have been used successfully for years in humans, especially to dissect the genetic basis of disease. The main difference between performing a GWAS analysis in humans vs. doing the same in Arabidopsis is obvious. Growing humans is very costly for their parents, and researchers cannot perform manipulative experiments; however, researchers can focus their efforts on sequencing the individuals that exhibit a phenotype of interest, such as a disease, to uncover its genetic basis. On the other hand, growing Arabidopsis accessions in a “common garden” experiment to record phenotypes of interest is relatively cheap, albeit labor intensive. Moreover, we can deliberately manipulate plant growth conditions to determine their effects on plant phenotypes (P × E). In a typical GWAS analysis, a subset of sequenced Arabidopsis accessions is grown to characterize phenotypes of interest. This allows assessment of how variation in each of those phenotypes associates with the intraspecific genetic variation (G×P).
Plants as sessile organisms cannot run away from adverse environmental conditions; thus, adaptation to the local environment is paramount. Following this, we can expect Arabidopsis accessions collected from areas where plants are more likely to encounter, for example, high temperatures or drought, will exhibit greater tolerance of these environmental stresses. But performing such a common garden experiment for all 1,135 accessions sequenced in the 1001 Genomes Project would be very challenging. In reality, experimental GWAS studies rarely surpass a few hundred accessions. Such sub-setting has the disadvantage that the phenotypic and genetic variation that is not present in the chosen subset is necessarily disregarded, so the choice of the accessions to include in such studies becomes critical. For instance, if we choose accessions to assess the genetic basis of drought tolerance, it is vital that we pick accessions from droughted environments as well as accessions from environments where drought does not limit plant growth, in order to accurately encompass the relevant range of variation. For this reason, information on the local environments of these sequenced individuals is essential.
We accordingly started collecting environmental data for relevant environmental variables to be assessed in our studies, based on our interests in plant abiotic stress tolerance. We soon came to appreciate the vast amount of environmental data available from satellite, climate stations, surveys, and climate models. At that point, our original experimental objectives gave us the idea for a different project which became the basis of the present study. We envisioned a dataset and a tool that we would have used ourselves for our own experimental design. This tool would include extensive information on the local environmental conditions present at the sites of collection of the accessions within the 1001 Genome Project. We accordingly extracted and compiled 204 geo-climatic variables describing the local environments of these accessions.
There is an ongoing effort in the Arabidopsis community to make data publicly available to all researchers. The same is true for the 1001 Genomes Project, for which a series of tools are already available to explore the genotypes and phenotypes for this set of accessions, as along with tools to easily run de novo GWAS analysis and explore previous associations3-6 Now, the genome, methylome, transcriptome, predicted proteome, and phenome of these accessions can be complemented with our ‘environome.’ We accordingly implemented an online tool that we present in our Nature Ecology & Evolution paper, AraCLIM, which will assist researchers in exploring the local environment of these accessions.
Once we extracted and curated a description of the local environments, the next logical step was to characterize the genome-wide genetic variation associated with these environments (G×E). For this, we conducted GWAS analysis using each one of the environmental variables we extracted and considering it as a ‘phenotype.’ In another effort to make our results available and to assist researchers in evaluation of the potential adaptive association between any given genetic variant and the environment, we created two more tools, CLIMGeno and GenoCLIM. CLIMGeno allows identification of genetic variants associated with any of our geo-climatic variables of interest. Conversely, GenoCLIM allows users to discover if there is natural variation in any gene of interest that is associated with any of the environmental variables that we extracted to define the local environments of Arabidopsis. Finally, we developed FDRCLIM, which allows users to evaluate the statistical significance of any of the G×E associations in our study.
Following the research interest of our lab, we discovered a significant association between natural variation in the Arabidopsis G protein gamma subunit 3 (AGG3) and the minimum temperature of the coldest month. We then confirmed a causative relationship though wet bench analysis, unveiling a new function of this protein in cold tolerance.
Once we characterized the ‘environome’ (E), and explored the relationships between genotype and environment (GxE), the last logical step was to identify associations between the local environment and the pool of phenotypes (P×E) published over the years for the accessions within the 1001 Genome Project. For this, we took advantage of an online tool already available to the scientific community: Arapheno.3 AraPheno3 includes a curated list of publicly available GWAS-based phenotypes accumulated over the years for Arabidopsis accessions. We retrieved a list of 131 phenotypes and created PhenoCLIM, a tool to interactively compare these with each of the environmental variables we provide to describe the local environment of Arabidopsis.
This study and our Arabidopsis CLIMtools resource (http://www.personal.psu.edu/sma3/CLIMtools.html) that encompasses the online tools we created would not have been possible without the insights, resources, and tools made available by the Arabidopsis community throughout the years. We hope that our results and the tools we provide to dissect the environome and its association with phenomes and genomes will likewise assist the community in future research.
1. Kawakatsu, T. et al. Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166, 492-505 (2016).
2. Alonso-Blanco, C. et al. 1,135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166, 481-491 (2016).
3. Seren, Ü. et al. AraPheno: a public database for Arabidopsis thaliana phenotypes. Nucleic Acids Res 45, D1054-d1059 (2017).
4. Joshi, H.J. et al. 1001 Proteomes: a functional proteomics portal for the analysis of Arabidopsis thaliana accessions. Bioinformatics 28, 1303-6 (2012).
5. Grimm, D.G. et al. easyGWAS: a cloud-based platform for comparing the results of genome-wide association studies. The Plant Cell 29, 5-19 (2017).
6. Togninalli, M. et al. The AraGWAS Catalog: a curated and standardized Arabidopsis thaliana GWAS catalog. Nucleic Acids Res 46, D1150-D1156 (2017).
7. Seren, Ü. et al. GWAPP: a web application for genome-wide association mapping in Arabidopsis. The Plant Cell, tpc. 112.108068 (2012).