User Tools

Site Tools


Release notes on public datasets

  • Bacteria

Available since February 2015 at: This dataset features 2692 complete bacterial genomes available in Ensembl Bacteria. It was initially released on June 2014. We report ∼2.4 billion orthologs; on average, their alignments have a coverage of ∼90%, an identity of ∼43%, a score of ∼241 and an e-value of ∼2e−05. We report ∼1.85 billion singleton orthologs and ∼140 million syntenies that comprise ∼550 million orthologs and ∼190 million non-BDBH homologs. On average, alignments of non-BDBH homologs have a coverage of ∼76%, an identity of ∼35%, a score of ∼145 and an e-value of ∼8e−05. This database was unavailable from Nov 10th 2016 to Sept 14th 2017 due to a database failure.

  • Archaea

Available since February 2017 at: This dataset features 210 complete, reference, and non redondant Archaea genomes available in Uniprot. It was initially released on February 2017. We report ∼12.4 million orthologs; on average, their alignments have a coverage of ~90.4%, an identity of ~41%, a score of ~208, and an e-value of ~2e-4. We report ∼8.7 million singleton orthologs and ∼1.2 million syntenies that comprise ∼3.7 million orthologs and ∼570,000 non-BDBH homologs. On average, alignments of non-BDBH homologs have a coverage of ~78.4%, an identity of ~36.4%, a score of ~153, and an e-value of ~4.5e-4.

  • Taxonomy

The above public databases make use of taxonomic information. The latest update of our taxonomy data is Feb 23rd 2017.

release_notes_on_public_dataset.txt · Last modified: 2017/12/14 13:41 by