Phylogenetic Profiles
Our work on phylogenetic profiles and gold standards valid in most, if not all, prokaryotes are available. This material was used in Moreno-Hagelsieb G and Janga SC (2007) Operons and the effect of genome redundancy in deciphering functional relationships using phylogenetic profiles. Proteins: Structure, Function, and Bioinformatics..
Non-redundant Genome Datasets
The files here consist of PERL modules that give lists of genome datasets filtered at different Genomic Similarity Score (GSS) thresholds. These files correspond to those used in the study.
These PERL modules contain eukaryotes, but the study included only prokaryotic genomes.
The files named NRDOMAINS
contain the lists of non-redundant genomes. The files named REPRESENTS
contain the information about which genome is representing another genome in the non-redundant genome dataset. For instance, the line:
"E_coli_O157H7_EDL933" => "E_coli_K12"
means that the E_coli_O157H7_EDL933
genome is represented by E_coli_K12
; E_coli_O157H7_EDL933
is redundant.
The numbers of the form: 0_XX
are GSS thresholds. For instance, NRDOMAINS_0_70
means a non-redundant genome dataset obtained with a GSS threshold of 0.70.
The REDUNDANCY table contains the list of non-redundant genomes obtained at a GSS threshold of 0.70.
PERL Modules
SCORES-GNMS: This file contains the GSS used to build the non-redundant genome datasets.
DOMAINS: The complete set of genomes available at the time of the study.
GSS | NRDOMAINS | REPRESENTS |
---|---|---|
50 | Download | Download |
60 | Download | Download |
70 | Download | Download |
80 | Download | Download |
90 | Download | Download |
Operon Predictions
These files contain operon predictions at a confidence value of 0.90 for all the genomes available at the time we started the work. The predictions were obtained with phylogenetic profiles generated using a non-redundant genome dataset filtered at a genomic similarity score of 0.70.
The format:
- 1st column: Gene pairs (GI/GI), GI = GenBank Identifier as obtained from RefSeq genome database.
- 2nd column: strand (forward or reverse)
- 3rd column: Inter-genic distance
- 4th column: Mutual information
No trackbacks yet.