Exploring random sequence space in the name of de novo genes

Random sequences expressed in bacteria seem to have biological activity that affects their growth, supporting the idea that de novo genes can have immediate impact on a organism.

Go to the profile of Rafik Neme
Apr 24, 2017
Upvote 5 Comment

The arrival and establishment of true innovation is among the chief questions in evolutionary biology. For many years, I have been interested in understanding how new genes appear for the first time, or de novo, from noncoding parts of the genome. I joined the lab of Diethard Tautz at the Max Planck Institute for Evolutionary Biology to begin my path as an evolutionary biologist and tackle this type of questions.

The last decade has been fruitful regarding our understanding of how truly new genes appear. There have been several systematic searches for de novo genes across the tree of life via the ever-growing catalogs of genomic data. In parallel, various de novo genes have been validated experimentally, beyond computational predictions, as essential parts of an organism.

During my early months in the Tautz lab, while still a Master’s student, I contemplated the possibility of doing an experiment that could support de novo evolution as a general process, and so I came up with a thought experiment. I would insert random sequences in living cells, together with enough regulatory machinery to make sure they would be transcribed and translated by the host. Then, I would wait until any of those would mutate enough to “acquire a function”. It occurred to me that starting with a sufficiently large pool of random sequences would reduce the waiting time, because some would exhibit some biochemical activity upon their introduction.

This is similar to the problem of having a monkey typewriting at random, and expecting it to produce a meaningful work of art. If I had infinite time I would expect to have a few occurrences of “The Library of Babel” by Borges. If I had infinite monkeys, I could expect a few of them to produce this text almost immediately.

I learned that a few years ago, Jack Szostak and Anthony Keefe did similar experiments. They searched for a specific biochemical function, namely ATP binding, starting from a rather large amount of sequences. In my “thought experiment”, I was looking for any function. Following the analogy of the monkeys, I was not looking for my favorite story, I was just looking for any coherent story.

When Diethard started to write the concept for an ERC grant on de novo gene evolution, we discussed extensively the possibilities how such an experiment could work. Our initial ideas centered around using a virus system and screen for resistance to virus infection to find de novo gene functions. The grant was successfully secured, and at that time Cristina Amador came to Plön to help us with the execution of these experiments. Cristina had extensive experience with microbial genetics, and we decided to try the much simpler approach that is now in the paper. In very short time she was able to put together our synthetic de novo genes design. Unfortunately, Cristina moved on to another position before we could start our first experiments, and the project was postponed for some time, while our other genomic endeavours received priority

A couple of years down the line I decided to revisit the project just in time before finishing my first postdoc, motivated by our recent insights about how transcription works to promote de novo gene birth. Over the course of a summer, together with Ellen McConnell and Burcin Yildirim, we were able to produce a large amount of replicates and repetitions of the simplest experiment: Transformed populations of Escherichia coli cells, each containing a different random sequence, were traced over the course of a week (or a day). We observed how the presence of a given sequence would impact the growth of cells, whereby cells with “beneficial” sequences would increase in frequency, while “deleterious” sequences would result in decrease in frequency in the population.

I can remember clearly the moment when the first results came in. Not only had we found beneficial activities over and over, but the sheer amount detected was beyond our imagination. We always had the naïve idea that products would either follow a distribution of effects observed for point mutations (mostly deleterious or neutral). In hindsight, we were overlooking that a peptide - or nucleotide - of this size is bound to cause some effect in the cell. We had expected that the sequence space would be largely devoid of functions, but it seems to be quite the opposite.

We now describe that most of the sequence space we explored seems to have biochemical activities that affect the growth of E. coli cells. We extend this to argue that de novo gene birth follows after these activities are exposed to the cellular environment, turning fragments of expressed sequences with activities into functional genes. This approach is an experimental way of understanding the process of de novo gene birth as it unravels.

Further, we have developed a method to produce and screen bioactive molecules that we hope can impact biomedical and pharmaceutical research in a not too distant future.

The paper in Nature Ecology & Evolution is here: http://go.nature.com/2q0UBh2

Poster art by http://lessart36.deviantart.com

Go to the profile of Rafik Neme

Rafik Neme

Postdoctoral Researcher, Columbia University Medical Center


Go to the profile of Richard Buggs
Richard Buggs 2 months ago

Hi Rafik, this is very cool! Congratulations on the paper. Do you have results for how the overall diversity of your E. coli populations was affected by the different treatments? I see that you started with about a million different random variants. Do you know how many of these were present at the end of cycle number 1 for the IPTG induced and non-induced treatments? How did this overall level of diversity change with the different cycles? This seems a fascinating aspect of the experiment. I am guessing that the non-induced populations maintained higher levels of diversity. Is that what happened? Richard

Go to the profile of Richard Buggs
Richard Buggs about 2 months ago

Hi Rafiq, I have answered my own question, thanks to your Dryad data (thanks for making it so accessible, BTW). I don't seem able to place a figure here in a comment, so I have tweeted a figure with my plot here https://twitter.com/RJABuggs/status/870697142070816768
My interpretation of this is that the vast majority of the random sequences were deleterious when expressed. The induced replicates have lower diversity upon the first round of sequencing, suggesting that many bacteria have died before the first cycle of the experiment, and then during the experiment, the diversity of the induced replicates continues to fall. Do you think that is the right interpretation? So most of the bioactivity is harmful bioactivity.