The arrival and establishment of true innovation is among the chief questions in evolutionary biology. For many years, I have been interested in understanding how new genes appear for the first time, or de novo, from noncoding parts of the genome. I joined the lab of Diethard Tautz at the Max Planck Institute for Evolutionary Biology to begin my path as an evolutionary biologist and tackle this type of questions.
The last decade has been fruitful regarding our understanding of how truly new genes appear. There have been several systematic searches for de novo genes across the tree of life via the ever-growing catalogs of genomic data. In parallel, various de novo genes have been validated experimentally, beyond computational predictions, as essential parts of an organism.
During my early months in the Tautz lab, while still a Master’s student, I contemplated the possibility of doing an experiment that could support de novo evolution as a general process, and so I came up with a thought experiment. I would insert random sequences in living cells, together with enough regulatory machinery to make sure they would be transcribed and translated by the host. Then, I would wait until any of those would mutate enough to “acquire a function”. It occurred to me that starting with a sufficiently large pool of random sequences would reduce the waiting time, because some would exhibit some biochemical activity upon their introduction.
This is similar to the problem of having a monkey typewriting at random, and expecting it to produce a meaningful work of art. If I had infinite time I would expect to have a few occurrences of “The Library of Babel” by Borges. If I had infinite monkeys, I could expect a few of them to produce this text almost immediately.
I learned that a few years ago, Jack Szostak and Anthony Keefe did similar experiments. They searched for a specific biochemical function, namely ATP binding, starting from a rather large amount of sequences. In my “thought experiment”, I was looking for any function. Following the analogy of the monkeys, I was not looking for my favorite story, I was just looking for any coherent story.
When Diethard started to write the concept for an ERC grant on de novo gene evolution, we discussed extensively the possibilities how such an experiment could work. Our initial ideas centered around using a virus system and screen for resistance to virus infection to find de novo gene functions. The grant was successfully secured, and at that time Cristina Amador came to Plön to help us with the execution of these experiments. Cristina had extensive experience with microbial genetics, and we decided to try the much simpler approach that is now in the paper. In very short time she was able to put together our synthetic de novo genes design. Unfortunately, Cristina moved on to another position before we could start our first experiments, and the project was postponed for some time, while our other genomic endeavours received priority
A couple of years down the line I decided to revisit the project just in time before finishing my first postdoc, motivated by our recent insights about how transcription works to promote de novo gene birth. Over the course of a summer, together with Ellen McConnell and Burcin Yildirim, we were able to produce a large amount of replicates and repetitions of the simplest experiment: Transformed populations of Escherichia coli cells, each containing a different random sequence, were traced over the course of a week (or a day). We observed how the presence of a given sequence would impact the growth of cells, whereby cells with “beneficial” sequences would increase in frequency, while “deleterious” sequences would result in decrease in frequency in the population.
I can remember clearly the moment when the first results came in. Not only had we found beneficial activities over and over, but the sheer amount detected was beyond our imagination. We always had the naïve idea that products would either follow a distribution of effects observed for point mutations (mostly deleterious or neutral). In hindsight, we were overlooking that a peptide - or nucleotide - of this size is bound to cause some effect in the cell. We had expected that the sequence space would be largely devoid of functions, but it seems to be quite the opposite.
We now describe that most of the sequence space we explored seems to have biochemical activities that affect the growth of E. coli cells. We extend this to argue that de novo gene birth follows after these activities are exposed to the cellular environment, turning fragments of expressed sequences with activities into functional genes. This approach is an experimental way of understanding the process of de novo gene birth as it unravels.
Further, we have developed a method to produce and screen bioactive molecules that we hope can impact biomedical and pharmaceutical research in a not too distant future.
The paper in Nature Ecology & Evolution is here: http://go.nature.com/2q0UBh2
Poster art by http://lessart36.deviantart.com