Democratic genomics

We had two articles recently published:

  1. G. Moreno-Hagelsieb, B. Hudy-Yuffa, Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT. BMC Res Notes 7, 651 (2014).
  2. N. Ward, G. Moreno-Hagelsieb, Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss? PLoS ONE 9, e101850 (2014).

The story goes as follows. At a talk by some group I heard that they were using UBLAST to quickly find members of some protein families rather than use a Hidden Markov Model approach. They said it was much faster, so I became curious. I downloaded USEARCH 5 back then to try and test for the things I commonly do with NCBI’s BLAST. I was surprised at how fast this program ran. In any event, I thought that testing this program for some task would be a good work for an undergrad student. That was Natalie’s undergrad thesis. Back then about using different options under USARCH to try and get as much coverage with UBLAST as with NCBI’s BLAST (UBLAST was not an option in USEARCH 5, rather, a local alignment search had to be done). In any event. we became more ambitious, and decided to test a few programs more. BLAT was something I was already playing with, while an article by Jonathan Eisen (Darling et al., PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243. 2014) pointed me in LAST’s direction (besides reviewers asking for more programs to be tested).

In any event, at some other talk, I think this was a talk by Robert Beiko. He mentioned something about BLAST being too slow for some task, and I asked him why not try UBLAST. He said something to the effect of not knowing how much they might miss.

The articles cover one task each. One is the task of finding orthologs as reciprocal best hits. Pretty straightforward. How many orthologs are found by each program when compared to BLAST. Essentially, to find orthologs as reciprocal best hits does not require the finding of every possible match. Top matches would be enough. So, if UBLAST, for example, found just a few top matches (under version 5, we could control the number of matches found  before the program stops looking), that would be enough to determine the best, and thus figure out reciprocal best hits. We though we might miss many matches, but still find most of the reciprocal best hits, and that’s what we found to be the case except between evolutionarily distant genomes (see second reference above).

For the test on overannotation, the main idea was that for that tast we compare proportions, not total number of matches. Thus, if UBLAST, LAST, and BLAT missed potential homologs, but still found equivalent proportions to those found by NCBI’s BLAST, then the programs would work fine for estimating overannotation. Well, that’s what we found.

Finally, why democratic genomics? Well, tools that can run sequence comparisons in a fraction of the time that BLAST runs, and that in a desktop computer, then comparative genomics of a much larger scale becomes available for most if not all bioinformaticians. Why would I care? Well, because the most people can participate the higher the number of ideas that can make it into the field. Not everybody has access to computer clusters. There’s other avenues towards this democracy, like the availability of some precomputed homologies and orthologies. Yet, people will want to do their own tests for many reasons. From doubting the quality of existing data, to testing genomes and protein sequences not already available in databases. Maybe there’s also a good chance that genome and protein comparisons will be done via cloud computing, and be quite accessible to mere mortals. Maybe web-based tools like RAST and MG-RAST are good enough for these tasks instead of having our own thing. I don’t know. For now I think that the more options the better. These two articles are not enough. Strategies should also be developed to avoid wasting time and effort comparing sequences. As we develop our ideas and test programs, we will publish our results either in articles, or, if not enough for a publication proper, in blog entries.

Have fun!

-Gabo

The Latinamerican bioinformatics force

20140619_145020

The Latin-American conSequences force

Since Julie was leaving on Saturday, those present in the lab last Thursday had lunch together.

Julie is a PhD student co-supervised by me and Dr. Santoyo. She came from Mexico for a few months to learn some bioinformatics that she will apply to her PhD project on the rhizospheric microbiome associated to a few crops.

See ya later Julie!

Three-minute thesis 2014

Today, Scott Dobson-Mitchell was the runner up at the three-minute thesis competition (3MT) at Laurier. image

Marc is done with the M.Sc.!

undergrads_fall2013Marc presented his thesis defense last Wednesday (Oct 30). All is well. Some corrections to make, but that’s that. Anyway, the photo presents the undergrad force of the lab of Computational conSequences (Brigitte, Erum, and Thomas), plus Marc. Taken that very day.

Congrats Marc!

Nobel 2013

I reacted too late. Maybe because I was at a conference:

http://smb-bacterias.org/images/POSTER_tabloide.pdf

In any event, the Nobel prizes! Take a look:

http://www.nobelprize.org/

The laureates in chemistry! I used to read several articles by Martin Karplus and by Michael Levitt when I was a grad student.

Have fun!

–Gabo

CSM 2013

Several members of The Lab of Computational conSequences went to the Canadian Society of Microbiologists conference in Ottawa last week: Lisa, Jenny, Marc, Scott, and honorary members Mike Lynch, and Laura (Lisa’s sister). All of them presented posters, Jenny gave her first talk in a scientific conference, and Mike gave a talk that I missed on exploring “the rare biosphere” (your homework to figure out what that means).

Posters were successful, Marc, who is working on the evolution of regulation of transcription by, ahem, transcription factors, had lots of visitors, the twins (Lisa and Laura) presented work on the gene cluster for cellulose biosynthesis in Bacteria, Jenny talked about 16S rRNA genes, and Scott presented a bit about phage and horizontal gene transfer.

We shall talk about these projects some time soon. We are preparing several articles and will post something about them as they are finished and submitted.

Have fun!

Visitors this summer

We have two visitors this summer to the lab of Computational conSequences:

  1. Karla Valenzuela, originally Chilean, working for her master’s at Dalhousie in Halifax (NS, Canada). Karla is doing some analyses I always was curious to do: evolutionary trace analyses, plus a few other, related, thingies. This means going back to my structural biology roots.
  2. Ismael Hernández, originally from Mexico, working for his PhD in CINVESTAV-Irapuato. Ismael is analyzing several strains of Bacillus isolated from Cuatro Ciénegas, Coahuila, Mexico.

We are talking a lot in Spanish, which is inspiring the Canadian Students in the lab to keep learning the language. Of course, we have had to explain differences between Chilean Spanish and Mexican Spanish, and it’s been fun.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: