Orthologs

Supporting materials for PLoS ONE 2014 article:

Ward N, Moreno-Hagelsieb G (2014) Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss? PLoS ONE 9, e101850.

Please find tables with homologs as found by each program tested and corresponding RBHs following this link.

The format for results is the same we would obtain from the “-outfmt 6” option under BLAST+, or the “-m 8” option under legacy BLAST. RBHs add a last column indicating that a putative ortholog is either a RBH, or a fusion. Tables with counts can be found here.

Homologs:

  1. BLAST+ (blastp)
  2. BLAT (blat)
  3. UBLAST (usearch7)
  4. LAST (lastal)

Reciprocal Best Hits (orthologs):

  1. BLAST+ (blastp)
  2. BLAT (blat)
  3. UBLAST (usearch7)
  4. LAST (lastal)

Supporting material for 2008 article:

Moreno-Hagelsieb G, Latimer K (2008) Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics 24: 319-324

NOTE: NCBI has released a completely re-programed BLAST suite called BLAST+. We  found that the options we tested in our 2008 article with legacy BLAST work similarly with BLAST+. If you wish to further test these options using BLAST+ consult the table below:

Equivalent options for Masking and alignment types between legacy BLASTP and BLASTP+ (some options cannot be stated explicitly in BLAST+)
Definition Legacy Options BLASTP+ options
Hard Masking & BLAST alignment -F T -s F -seg yes -soft_masking false
Hard Masking & S-W alignment -F T -s T -seg yes -soft_masking false -use_sw_tback
Soft Masking & BLAST alignment -F “m S” -s F -seg yes -soft_masking true
Soft Masking & S-W alignment -F “m S” -s T -seg yes -soft_masking true -use_sw_tback

The tables following This link contain orthologs as Reciprocal Best Hits detected using soft filtering of low-information segments and a final Smith-Waterman alignment as described in Moreno-Hagelsieb G, Latimer K (2008) Bioinformatics 24(3): 319-324

The tables contain orthologs detected between Escherichia coli K12 MG1655 and other genomes in the RefSeq genome database.

The columns in the tables are:

  1. the query sequence id
  2. the subject sequence id
  3. percent identity
  4. alignment length
  5. mismatches
  6. gap openings
  7. query alignment start
  8. query alignment end
  9. subject alignment start
  10. subject alignment end
  11. e-value
  12. bit score
  13. Specify Reciprocal Best Hit (RBH), or fusion (LEFT|RIGHT)
  1. November 7th, 2014
You must be logged in to post a comment.
%d bloggers like this: