Archive for the ‘ genomics ’ Category

Undergrad theses!

This term I have three students working on their undergrad theses, plus one working on directed studies. I am very proud of these students. Lots of initiative, reading articles, trying the computer (except for one, they hadn’t worked under unix before!), now having lots of success running their commands, and looking at results!

What are they doing? Two of them are working with protein domains in transporter proteins (from the TCDB), one on sorting prokaryotic genomes into taxonomically-coherent groups, one more on the divergence of orthologs and paralogs.


What’s true for E. coli is true of an elephant

The quote by Jacque Monod in the title celebrates our recent publication of an article suggesting that our previous results in Escherichia coli hold true for most other prokaryotes:

  • del Grande, M., & Moreno-Hagelsieb, G. (2014). The loose evolutionary relationships between transcription factors and other gene products across prokaryotes. BMC Research Notes, 7, 928. doi:10.1186/1756-0500-7-928

This article expands on the part about transcriptions factors presented in our previous study comparing the conservation of different kinds of functional associations:

  • Moreno-Hagelsieb, G., & Jokic, P. (2012). The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic Acids Research, 40(15), 7104–7112. doi:10.1093/nar/gks443

The earlier article dealt with several experimentally-confirmed functional interactions determined in Escherichia coli: genes in operons, genes whose products physically interact, genes regulated by the same transcription factor (regulons), and genes coding for transcription factors and their regulated genes. In that study we found that the associations involving transcription factors tend to be much less conserved than any of the other associations studied. Our work is not the first to suggest this lack of conservation, but is the first to compare conservation across different kinds of associations, and thus show that those mediated by transcriptional regulation are the least conserved.

The most recent article was an expansion of the association between genes coding for transcription factors and other genes. The idea being to extend the study towards as many other prokaryotes as possible. But how could we determine conservation between genes coding for transcription factors and other genes without experimentally-determined interactions? We knew that at least some transcription factors could be predicted from their possessing a DNA binding domain. But what about their associations? Our prior experience has been that target genes are hard to predict even when there’s information on some characterized binding sites (sites that we like calling operators for tradition’s sake). So what to do if we have only the transcription factors? Well, to answer that we should first explain how we measured relative evolutionary conservation.

To measure evolutionary conservation we used a measure of co-occurrence called mutual information. For any two genes, the higher the mutual information, the less the observed co-occurrence looks random. Since we obtained mutual information scores for all gene pairs in the genomes we analyzed, we decided that instead of something as hard as predicting operators, and matching them to predicted transcription factors, we could use top scoring gene pairs as representatives of the most conserved interaction between our predicted transcription factors and anything else. This allowed us to compare the most conserved interactions involving transcription factors against the conservation of other interactions. Our findings suggest that interactions involving transcription factors evolve quickly in most-if-not-all of the genomes analyzed.

Please read the articles for more details and information.


Non-redundant prokaryotic genomes

We just had an Applications Note accepted in Bioinformatics. The little note presents a tool we develped to choose sets of non-redundant prokaryotic genomes (see Research-Genome Clusters too).

The tool derives from previous work where we selected sets of non-redundant prokaryotic genomes filtered at different levels of similarity for such tasks as displaying results on operon predictions, to finding the level of filtering out redundancy to maximize the number of high-quality predicted associations by phylogenetic profiles. Other groups have been using our non-redundant sets. Thus, we thought it was better to share to the wider community and we developed this tool. If you have suggestions for improvements, please let us know. We cannot promise to implement all suggestions, but we will try to make the tool very useful. Also note that the R-scripts used to produce these datasets are provided (as is). These might help you develop your own datasets if so you require.


Evolutionary conservation

We have a new article in Nucleic Acids Research:

  • Moreno-Hagelsieb G, Jokic P (2012) The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic Acids Res 40: 7104-7112.

The article twists the normal use of phylogenetic profiles, which is that of predicting functional interactions. The idea for phylogenetic profiles is that if we observe that two genes co-occur their products might work together. What does this mean? Well, to co-occur means to appear both in the same genome, and to be both absent whenever the either one would be absent. A most excellent idea. A most difficult one to use for actual predictions. OK then, hard to use for predictions? Why? Not sure, but, for starters, we can see that genes that work together in one organism do not co-occur that much across organisms. So I thought, maybe functional interactions are not well conserved. Maybe partners in functional crime are exchanged with ease. How would we know? Well, maybe if we look at the phylogenetic profiles of collections of genes whose products functionally interact we could see something of a rate of exchange, maybe the rates would be difficult to estimate, so what about comparing against the whole background of co-occurrence? What about finding some “gold standards”? … and that was like an eureka moment. What about comparing different kinds of interactions in terms of their conservation? So, I tried a few, and lo and behold, interactions via co-regulation (regulons) looked worse than a “gold negative,” namely transcription unit boundaries (adjacent genes in the same strand, but different transcription units).

So there you go. The most surprising result was the low levels of conservation for interactions mediated via regulons. The best part was that the most conserved interactions were those among genes found in the same transcription unit (in operons). Why best? Because a lot of my research has been about using operon predictions for predicting networks of functional interactions. Since these interactions are the most conserved, we might expect them to be the most useful to infer functional interactions. Right? Well, maybe. Still lots of research needed. I hope you enjoy the article.


Computational Genomics and Metagenomics


Network of functional interactions for the arginine repressor

Welcome to the web page of the lab of Computational Genomics and Metagenomics, a.k.a. the lab of Computational Microbiology, the lab of Computational Microbiomics, and the lab of Computational Con-Sequences.

We are interested in all things genomic, metagenomic, postgenomic, postmetagenomic, and hyperultramegasupragenomic (!). Our work centres around the evolution and the inference both of function and of functional interactions of gene products, mostly in Prokaryotes.

Everything in this lab is done with computers. Yet, besides working with other computational biologists, we also have collaborations with wet labs.

You might be wondering how this kind of research got started. Well, it all began with the idea that we should stop finding the genes in the human genome, one by one, by laborious and intense work linking phenotypes (what we see) to finding the very gene, or genes (what we don’t see), responsible for such phenotypes. Not that such work is not valuable, au contraire, without that work providing us with real life examples we would not be in any position for making sense of genome sequences. It was probably some kind of a case of impatience [and boldness]. Of course, there is also the tiny detail that knowing our complete genetic complement (a useful definition of “genome”) would provide us with a wider and more accessible basis for the better and faster finding of genes behind phenotypes. Now, substitute the word “phenotype” by whatever disease that might involve genetics (or badly gone genetics), such as “cancer,” or “diabetes,” and you might get a better feeling of importance for this task.

As you might guess, quite well, this ambitious project set the whole machinery in motion. Long story short (but I might try and let you know better later), the technological advancements brought about by the idea of having our beautiful 23+1/2 pairs of chromosomes sequenced allowed scientists to sequence microbes. With those genomes available, before we even had a first draft of our own, other technologies arose, technologies focused on making sense of newly found genes in those genomes. A couple of these are transcriptomics, which started with microarrays, used to find out which genes are expressed by finding their messenger RNAs; and proteomics, used to figure out which proteins are being produced. That my friends, started the “postgenomic era.” I gave it away, didn’t I? You have thus guessed that “postgenomics,” in the second paragraph, refers to the products of these new technologies, and you are absolutely right.

Well, not content with the human genome (the draft was announced in 2000, yes, ten years ago!), and with powerful sequencing technologies available, another new field arose. The field of environmental genomics, or metagenomics (the word “metagenomics” was used to mean the fishing of genes with particular functions from the environment, before it was used in this context, but let’s not go there). This thing is about sequencing fragments of DNA isolated from an environment (I was going to write “from a given environment,” but I resisted), and then guessing things about the microbes, or whatever, in such an environment. Others refer to “metagenomics” as sequencing without culturing, but let’s not go there either. Well, since now scientists are sequencing mRNA, rather than DNA, we could say that the postmetagenomic era has dawned. Though I haven’t seen this word used in a paper yet. In any event, such a humongous amount of data necessarily calls for computational analyses, and here we are.

Well, hard to guess where all this is going, but the sequencing technologies keep improving and getting cheaper. We cannot but expect the word “deluge” in all of the published papers of the genomic era to become ridiculous by comparison. This means lots of challenges to make sense of the information. Lots of new avenues of research too. This is why I am reserving the word “hyperultramegasupragenomic” (and its “post-” derivative) for later use. The way things are going, it might not be that much later.

With that, welcome again, and enjoy your visit.


%d bloggers like this: