Phylogenetic Profiles

Our work on phylogenetic profiles and gold standards valid in most, if not all, prokaryotes are available. This material was used in Moreno-Hagelsieb G and Janga SC (2007) Operons and the effect of genome redundancy in deciphering functional relationships using phylogenetic profiles. Proteins: Structure, Function, and Bioinformatics..

Non-redundant Genome Datasets

The files here consist of PERL modules that give lists of genome datasets filtered at different Genomic Similarity Score (GSS) thresholds. These files correspond to those used in the study.

These PERL modules contain eukaryotes, but the study included only prokaryotic genomes.

The files named NRDOMAINS contain the lists of non-redundant genomes. The files named REPRESENTS contain the information about which genome is representing another genome in the non-redundant genome dataset. For instance, the line:

"E_coli_O157H7_EDL933" => "E_coli_K12" means that the E_coli_O157H7_EDL933 genome is represented by E_coli_K12E_coli_O157H7_EDL933 is redundant.

The numbers of the form: 0_XX are GSS thresholds. For instance, NRDOMAINS_0_70 means a non-redundant genome dataset obtained with a GSS threshold of 0.70.

The REDUNDANCY table contains the list of non-redundant genomes obtained at a GSS threshold of 0.70.

PERL Modules

SCORES-GNMS: This file contains the GSS used to build the non-redundant genome datasets.

DOMAINS: The complete set of genomes available at the time of the study.

50 Download Download
60 Download Download
70 Download Download
80 Download Download
90 Download Download

Operon Predictions

These files contain operon predictions at a confidence value of 0.90 for all the genomes available at the time we started the work. The predictions were obtained with phylogenetic profiles generated using a non-redundant genome dataset filtered at a genomic similarity score of 0.70.

The format:

  • 1st column: Gene pairs (GI/GI), GI = GenBank Identifier as obtained from RefSeq genome database.
  • 2nd column: strand (forward or reverse)
  • 3rd column: Inter-genic distance
  • 4th column: Mutual information
  1. No trackbacks yet.

You must be logged in to post a comment.
%d bloggers like this: