Can we define a specific gene repertoire for each animal lineage?

The Animal Kingdom is huge and diverse. I wanted to find out how genes can explain this diversity. Was it through the emergence of novel genes? The loss of old ones? Or a mixture of both perhaps?

Like Comment
Read the paper

In the beginning of this investigative mission to digitally mine the set of blueprints that go with each animal phyla, many people would ask me “So what is your thesis title?” - a question I could answer with total uncertainty.

Together, with Jordi Paps and Peter W H Holland, to answer this question we aimed to compare all the genes in a selection of 102 eukaryote genomes. Frustration was a key player throughout our research to determine what genes make each specific animal lineage. By the time the gene similarity and comparison analyses were completed (>3 months), new animal genomes were being published. However, we could not start again each time new data was available, or we would never be able to publish the exciting results we do have. Bring on the Darwin Tree of Life project!

One of the most essential criteria used in the selection of the 102 eukaryote whole genomes was to ensure that the outgroup was large and varied, reducing incorrect assignment of genes as new or lost. That included 43 non-animal eukaryotes covering as many lineages as possible to support the comparison of 59 animal genomes spanning all the animal phyla available when we started the analysis back in 2016. Yes, more major animal lineages have become available since, but, up to this point, this has been one of the most diverse samplings of whole genomes for a comparative analysis yet.

Gene gains and losses in each animal lineage
Triangles represent the numbers of broad gene losses (orange) and gains (green). Animal silhouettes from

The results showed impressive genome level losses and gains of genes, not previously seen. Two major lineages of animals, containing most model organisms (ecdysozoans) and ourselves as humans (deuterostomes), have lost unexpectedly large numbers of genes. Some of the phyla, such as flatworms, nematodes and those adorable tardigrades revealed an almost balanced genomic turnover, with both huge losses of ancestral genomic material, and equally large attainment of novel genomic material. We mention just a minuscule portion of the results in our paper here, the rest we leave to the readers to interpret, if they fancy braving the dive into the supplementary data we have provided.

Many hypotheses could be speculated from the results of this research. One of these is that with the phyla such as nematodes and flatworms, which are fast evolving, what we see is a fast divergence beyond similarity of proteins from one ancestral set of genes, to a novel set of genes, rather than a true loss of genomic content, simply an upgrade. Another is that gene number is not at all linked to genome complexity, as shown in lineages containing humans compared to the lineages containing largely parasitic species. Finally, every single animal lineage is made up of a unique, equally complex gene repertoire and shaped by combinations of evolutionary events, including the loss, gain and recycling of genomic content.

Can we define a specific gene repertoire for each animal lineage? Yes, I reckon so, as soon as we have sequenced them all. I have submitted my thesis, with a title, I might add, but this investigation is far from over.

Cristina Guijarro-Clarke

Comparative Genomics Developer, EMBL-EBI