Archive for the ‘ science ’ Category

The whole 2015 Spring/Summer group

lab-photo-2015-reducedHere the whole group in the lab of Computational conSequences during the Spring/Summer of 2015. I’d say that this is the best group ever.

Gustavo leaves today, going back to Michoacán, Mexico after spending his sabbatical here. Julie left a few weeks ago, also back to Michoacán. She might come back for the Spring/Summer 2016.

The only locals are Brigitte, Kissa, Thomas, César and me. Brigitte and Kissa being honorary members who have been in the lab for collaborative reasons, but work for their M.Sc. degrees with other faculty members at Laurier (Michael Suits and Geoff Horsman, respectively).

We’ve been working on phages, plant-growth promoting bacteria, 16S rRNA gene analyses, metabolic annotations, gene neighborhoods, predicting gene functions, and predicting metabolism and transcriptional regulation networks. Lots of fun.

Summer 2015 group

group-2015-reducedThis is [most of] the group in the lab of Computational conSequences this summer. Several visitors from Mexico! Julie, Gustavo, and Ramiro from Michoacán, and Adrián from Mexico City.

What we’re doing?

In no particular order:

  • Julie is working with 16S rRNA genes
  • Gustavo is on sabbatical doing all kinds of reviews and such on plant growth promoting bacteria
  • Ramiro is working on the genome of a plant growth promoting bacteria
  • Adrián is working on Phage
  • Kissa is working with adjacent genes (gene neighborhoods)
  • Harold is working on genome annotations
  • Thomas is working on predicted functions (metabolism and such)
  • César is working on regulatory networks in prokaryotes, and on metagenome annotations

Democratic genomics

We had two articles recently published:

  1. G. Moreno-Hagelsieb, B. Hudy-Yuffa, Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT. BMC Res Notes 7, 651 (2014).
  2. N. Ward, G. Moreno-Hagelsieb, Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss? PLoS ONE 9, e101850 (2014).

The story goes as follows. At a talk by some group I heard that they were using UBLAST to quickly find members of some protein families rather than use a Hidden Markov Model approach. They said it was much faster, so I became curious. I downloaded USEARCH 5 back then to try and test for the things I commonly do with NCBI’s BLAST. I was surprised at how fast this program ran. In any event, I thought that testing this program for some task would be a good work for an undergrad student. That was Natalie’s undergrad thesis. Back then about using different options under USARCH to try and get as much coverage with UBLAST as with NCBI’s BLAST (UBLAST was not an option in USEARCH 5, rather, a local alignment search had to be done). We became more ambitious, and decided to test a few more programs. BLAT was something I was already playing with, while an article by Jonathan Eisen (Darling et al., PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243. 2014) pointed me in LAST’s direction (besides reviewers asking for more programs to be tested).

Later on, at some other talk, I think this was a talk by Robert Beiko. He mentioned something about BLAST being too slow for some task, and I asked him why not try UBLAST. He said something to the effect of not knowing how much they might miss.

The articles we published cover one task each. One is the task of finding orthologs as reciprocal best hits. Pretty straightforward. How many orthologs are found by each program when compared to BLAST. Essentially, finding orthologs as reciprocal best hits does not require the finding of every possible match. Top matches would be enough. So, if UBLAST, for example, found just a few top matches (under version 5, we could control the number of matches found  before the program stops looking), that would be enough to determine the best, and thus figure out reciprocal best hits. We though we might miss many matches, but still find most of the reciprocal best hits, and that’s what we found to be the case except between evolutionarily distant genomes (see second reference above).

For the test on overannotation, the main idea was that for that task we compare proportions, not total number of matches. Thus, if UBLAST, LAST, and BLAT missed potential homologs, but still found equivalent proportions to those found by NCBI’s BLAST, then the programs would work fine for estimating overannotation. Well, that’s what we found.

Finally, why democratic genomics? Well, tools that can run sequence comparisons in a fraction of the time that BLAST runs, and that in a desktop computer, then comparative genomics of a much larger scale becomes available for most if not all bioinformaticians. Why would I care? Well, because the most people can participate the higher the number of ideas that can make it into the field. Not everybody has access to computer clusters. There’s other avenues towards this democracy, like the availability of some precomputed homologies and orthologies. Yet, people will want to do their own tests for many reasons. From doubting the quality of existing data, to testing genomes and protein sequences not already available in databases. Maybe there’s also a good chance that genome and protein comparisons will be done via cloud computing, and be quite accessible to mere mortals. Maybe web-based tools like RAST and MG-RAST are good enough for these tasks instead of having our own thing. I don’t know. For now I think that the more options the better. These two articles are not enough. Strategies should also be developed to avoid wasting time and effort comparing sequences. As we develop our ideas and test programs, we will publish our results either in articles, or, if not enough for a publication proper, in blog entries.

Have fun!


Nobel 2013

I reacted too late. Maybe because I was at a conference.

In any event, the Nobel prizes! Take a look:

The laureates in chemistry! I used to read several articles by Martin Karplus and by Michael Levitt when I was a grad student.

Have fun!


Nobel Prizes 2012

The Nobel Prizes are being announced this week (Oct. 8th to Oct 15th).

Nobel Prize


Peer reviewing and atavisms

Summary: let’s make manuscripts for review reviewer-friendly instead of atavist-editor-friendly.

There are many things we carry on because of … let us call it “tradition” to avoid calling it by its proper name: “atavism.”

Today I am finishing reviewing a manuscript, and I feel irritated again that the article has the figures last, by themselves, and that I have to jump from one page with all the figure legends, while trying to match them to figures that the journal’s software had the good idea to tag with numbers, but still, no legends. Shit. I wonder, I publish articles myself, and I have decided to put the legends at the bottom of the figures because my experience as a reviewer has told me how much easier it would be if this was the norm. A few journals, when you upload figures, have a field for the legend, but few authors seem to notice. What about mes amis et amies, you made this very clear to authors? Why do we carry on with this atavism from much older times when figures were sent by snail mail, for lack of anything better, and pages had to be put physically together, and a whole process of postprodution (I don’t know why the speller is suggesting “prostitution” instead of this word) carried on. Who knows why the figures had to come separated from the legends, but whatever, it was so. Today, we electronically send the figures and manuscript first for review, and we are asked later to send “production” figures anyway. So why not save some pain to our peer reviewers and give them something easier to examine? Shit, even if the journal does not ask you so. They will ask for “proper” figure later anyway (if and when your article gets accepted, that is). So double and triple please, put those legends with the figures. Let us stop this atavist custom and be merrier.

Most sincerely,

Gabo, the angry reviewer

%d bloggers like this: