Posts Tagged ‘ Academic publishing ’

Gene content homoplasy

We just published an article, in collaboration with the laboratory of Gabriela Olmedo-Alvarez, trying to solve the phylogeny of Bacillus, which started with a special focus on Bacillus isolated from aquatic environments. The work required solving several little problems just to get those phylogenies, choosing the appropriate genes/proteins for the analyses, thinking of distance measures, etc. All of which I’ll be delighted to write about in later posts. For now, I’ll concentrate on the main finding.

In a previous article, also in collaboration with Gabriela, we had presented several phylogenies all showing that aquatic Bacillus clustered together into a single clade. Given the continuous growth of the public genome databases (see the previous two posts for example), and that Gabriela’s lab continues sequencing aquatic and other interesting Bacillus, we wondered whether the aquatic group would stand to this avalanche of new data. So there we were, choosing Bacillus and checking the data about their places of isolation. As in the previous work, we built a phylogenetic tree based on the 16S rRNA genes, a second one based on the proteins encoded by genes present in all of the genomes under analysis (a core tree), a third tree based on marker genes for phylogenomics published by Jonathan Eisen‘s group, and a hierarchical cluster based on the [dis]similarity of shared proteins between each pair of genomes that we call the Genomic Similarity Score (GSS).

The phylogenetic trees showed a clade of aquatic Bacillus, but several other aquatic Bacillus landed in other clades, thus breaking the pattern previously found. However, the GSS analysis placed the aquatic Bacillus closer together than any of the phylogenetic trees. We were surprised because, in the previous work, the GSS cluster reflected the results of the phylogenetic trees. We therefore started looking for an explanation for this discrepancy.

The phylogenetic trees are restricted to using genes or proteins shared by all the genomes under analysis, while the GSS is not limited, as it uses the similarity of all of the proteins encoded by the genes shared by each pair of genomes. Thus, we thought that there might be more genes shared between organisms of similar environments, than would be expected from their different vertical origins. After all, it is not rare for Bacteria to receive genes via horizontal gene transfer (HGT).

To test for this possibility, we proceeded to make analyses based on gene content as reflected by the classification of their encoded proteins into protein families, and the comparison of such content across organisms. We produced clusters based on gene content and, again, aquatic Bacillus were clustered better than in the phylogenetic trees. Further analyses showed some genes prevailing in groups from each environment. Most of these “environmentally-related” genes were found in strains isolated from soil, and therefore every group had some interesting genes for future studies. Among them we found genes described in previous works as being related to the appropriate environments where we found them to be enriched.

We call this apparent tendency to share more genes than expected from vertical inheritance, perhaps due to environmental constraints, gene content homoplasy.

Main reference:

  1. Hernández-González IL, Moreno-Hagelsieb G, Olmedo-Álvarez G (2018) Environmentally-driven gene content convergence and the Bacillus phylogeny. BMC Evol Biol 18: 148.
Advertisements

What’s true for E. coli is true of an elephant

The quote by Jacque Monod in the title celebrates our recent publication of an article suggesting that our previous results in Escherichia coli hold true for most other prokaryotes:

  • del Grande, M., & Moreno-Hagelsieb, G. (2014). The loose evolutionary relationships between transcription factors and other gene products across prokaryotes. BMC Research Notes, 7, 928. doi:10.1186/1756-0500-7-928

This article expands on the part about transcriptions factors presented in our previous study comparing the conservation of different kinds of functional associations:

  • Moreno-Hagelsieb, G., & Jokic, P. (2012). The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic Acids Research, 40(15), 7104–7112. doi:10.1093/nar/gks443

The earlier article dealt with several experimentally-confirmed functional interactions determined in Escherichia coli: genes in operons, genes whose products physically interact, genes regulated by the same transcription factor (regulons), and genes coding for transcription factors and their regulated genes. In that study we found that the associations involving transcription factors tend to be much less conserved than any of the other associations studied. Our work is not the first to suggest this lack of conservation, but is the first to compare conservation across different kinds of associations, and thus show that those mediated by transcriptional regulation are the least conserved.

The most recent article was an expansion of the association between genes coding for transcription factors and other genes. The idea being to extend the study towards as many other prokaryotes as possible. But how could we determine conservation between genes coding for transcription factors and other genes without experimentally-determined interactions? We knew that at least some transcription factors could be predicted from their possessing a DNA binding domain. But what about their associations? Our prior experience has been that target genes are hard to predict even when there’s information on some characterized binding sites (sites that we like calling operators for tradition’s sake). So what to do if we have only the transcription factors? Well, to answer that we should first explain how we measured relative evolutionary conservation.

To measure evolutionary conservation we used a measure of co-occurrence called mutual information. For any two genes, the higher the mutual information, the less the observed co-occurrence looks random. Since we obtained mutual information scores for all gene pairs in the genomes we analyzed, we decided that instead of something as hard as predicting operators, and matching them to predicted transcription factors, we could use top scoring gene pairs as representatives of the most conserved interaction between our predicted transcription factors and anything else. This allowed us to compare the most conserved interactions involving transcription factors against the conservation of other interactions. Our findings suggest that interactions involving transcription factors evolve quickly in most-if-not-all of the genomes analyzed.

Please read the articles for more details and information.

-Gabo

Peer reviewing and atavisms

Summary: let’s make manuscripts for review reviewer-friendly instead of atavist-editor-friendly.

There are many things we carry on because of … let us call it “tradition” to avoid calling it by its proper name: “atavism.”

Today I am finishing reviewing a manuscript, and I feel irritated again that the article has the figures last, by themselves, and that I have to jump from one page with all the figure legends, while trying to match them to figures that the journal’s software had the good idea to tag with numbers, but still, no legends. Shit. I wonder, I publish articles myself, and I have decided to put the legends at the bottom of the figures because my experience as a reviewer has told me how much easier it would be if this was the norm. A few journals, when you upload figures, have a field for the legend, but few authors seem to notice. What about mes amis et amies, you made this very clear to authors? Why do we carry on with this atavism from much older times when figures were sent by snail mail, for lack of anything better, and pages had to be put physically together, and a whole process of postprodution (I don’t know why the speller is suggesting “prostitution” instead of this word) carried on. Who knows why the figures had to come separated from the legends, but whatever, it was so. Today, we electronically send the figures and manuscript first for review, and we are asked later to send “production” figures anyway. So why not save some pain to our peer reviewers and give them something easier to examine? Shit, even if the journal does not ask you so. They will ask for “proper” figure later anyway (if and when your article gets accepted, that is). So double and triple please, put those legends with the figures. Let us stop this atavist custom and be merrier.

Most sincerely,

Gabo, the angry reviewer

%d bloggers like this: